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Abstract 

Recently,  a  framework  for  multiscale  stochastic  modeling  was  introduced  based  on 
coarse- to-fine  scale-recursive  dynamics  defined  on  trees.  This  model  class  has  some  at¬ 
tractive  characteristics  which  lead  to  extremely  efficient,  statistically  optimal  signal  and 
image  processing  algorithms.  In  this  paper,  we  show  that  this  model  class  is  also  quite 
rich.  In  particular,  we  describe  how  1-D  Markov  processes  and  2-D  Markov  random 
fields  (MRF’s)  can  be  represented  within  this  framework.  The  recursive  structure  of 
1-D  Markov  processes  makes  them  simple  to  analyze,  and  generally  leads  to  compu¬ 
tationally  efficient  algorithms  for  statistical  inference.  On  the  other  hand,  2-D  MRF’s 
are  well  known  to  be  very  difficult  to  analyze  due  to  their  non-causal  structure,  and 
thus  their  use  typically  leads  to  computationally  intensive  algorithms  for  smoothing  and 
parameter  identification.  In  contrast,  our  multiscale  representations  are  based  on  scale- 
recursive  models  and  thus  lead  naturally  to  scale-recursive  algorithms,  which  can  be 
substantially  more  efficient  computationally  than  those  associated  with  MRF  models. 
In  1-D,  the  multiscale  representation  is  a  generalization  of  the  mid-point  deflection  con¬ 
struction  of  Brownian  motion.  The  representation  of  2-D  MRF’s  is  based  on  a  further 
generalization  to  a  “mid-line”  deflection  construction.  The  exact  representations  of  2-D 
MRF’s  are  used  to  motivate  a  class  of  multiscale  approximate  MRF  models  based  on 
one- dimensional  wavelet  transforms.  We  demonstrate  the  use  of  these  latter  models  in 
the  context  of  texture  representation  and,  in  particular,  we  show  how  they  can  be  used 
as  approximations  for  or  alternatives  to  well-known  MRF  texture  models. 
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1  Introduction 


In  this  paper,  we  describe  how  to  use  a  class  of  multiscale  stochastic  models  to  represent  1-D 
Markov  and  reciprocal  processes  and  2-D  Markov  random  fields  (MRF’s).  Markov  models  in 
one  dimension  provide  a  rich  framework  for  modeling  a  wide  variety  of  biological,  chemical, 
electrical,  mechanical  and  economic  phenomena  [7].  Moreover,  the  Markov  structure  makes 
the  models  very  simple  to  analyze,  so  that  they  often  can  be  easily  applied  to  statistical 
inference  problems  (such  as  detection,  parameter  identification  and  state  estimation)  as  well 
as  problems  in  system  design  (e.g.  control  and  queuing  systems). 

In  two  dimensions,  MRF’s  also  have  been  widely  used  as  models  for  physical  systems 
[3,  4,  39,  23],  and  more  recently  for  images.  For  example,  Gaussian  fields  [45]  have  been 
used  as  image  texture  models  [17,  29,  10,  37],  and  the  more  general  Gibbs  fields  have 
been  used  as  prior  models  in  image  segmentation,  edge  detection  and  smoothing  problems 
[5,  25,  40,  38].  Causal  sub-classes  of  MRF’s,  such  as  Markov  Mesh  Random  Fields  [1,  21]  and 
Non-Symmetric  Half-Plane  Markov  chains  [28]  lead  to  two-dimensional  versions  of  Kalman 
filtering  algorithms  when  the  fields  are  Gaussian  [46].  In  addition,  efficient  fast  Fourier 
transform  algorithms  are  available  for  stationary  Gaussian  fields  defined  on  toroidal  lattices 
[18,  29,  11].  In  general,  however,  Markov  random  field  models  lead  to  computationally 
intensive  algorithms  (e.g.  stochastic  relaxation  [25])  for  estimation  problems.  In  addition, 
parameter  identification  is  difficult  for  MRF  models  due  to  the  problem  of  computing  the 
partition  function  [4,  41].  Thus,  while  Markov  random  fields  provide  a  rich  structure  for 
multidimensional  modeling,  they  do  not  generally  lead  to  the  simple  analysis  and  compu¬ 
tationally  efficient  algorithms  that  1-D  Markov  processes  do. 

These  computational  issues  are  the  most  important  obstacle  to  the  application  of  MRF 
models  to  a  broader  range  of  problems,  and  are  the  principal  motivations  for  the  in¬ 
vestigation  in  this  paper  of  the  richness  of  the  class  of  multiscale  stochastic  processes 
[15,  13,  14,  8,  9],  and  in  particular  of  how  such  multiscale  processes  can  be  used  to  exactly 
and  approximately  represent  Markov  random  fields.  Our  multiscale  stochastic  processes 
are  described  by  scale-recursive  models,  which  lead  naturally  to  computationally  efficient 
scale-recursive  algorithms  for  a  variety  of  estimation  problems.  For  instance,  fast  smoothing 
algorithms  are  developed  for  a  class  of  Gaussian  processes  in  [15,  13,  14].  Also,  Bouman 
and  Shapiro  demonstrate  how  a  related  multiscale  discrete  random  field  leads  to  an  efficient 
sequential  MAP  estimator  [8,  9].  In  this  paper,  we  demonstrate  how  a  simple  generalization 
of  the  models  in  [15,  13,  14]  leads  to  classes  of  models  which  can  be  used  to  represent  all 
1-D  Markov  processes  and  2-D  Markov  random  fields.  The  significance  of  this  result  is  that 
it  suggests  that  this  multiscale  modeling  framework  may  be  a  decidedly  superior  basis  for 
image  and  random  field  modeling  and  analysis  than  the  MRF  framework  both  because  of 
the  efficient  algorithms  it  admits  and  because  of  the  rich  class  of  phenomena  it  can  be  used 
to  describe. 

The  efficient  algorithms  to  which  the  multiscale  framework  leads  and  which  motivate 
our  work  here  have  already  led  to  interesting  and  substantial  new  developments  in  a  num¬ 
ber  of  areas.  In  addition  to  the  work  on  image  segmentation  described  in  [8,  9],  in  [35] 
we  exploit  the  multiscale  framework  to  develop  new  and  efficient  algorithms  for  estimating 
optical  flow  in  image  sequences.  Standard  formulations  of  this  problem  require  the  compu¬ 
tationally  intensive  solution  of  an  elliptic  partial  differential  equation  which  arises  from  the 
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often  used  “smoothness  constraint”  type  regularization,  corresponding  to  regularizing  the 
problem  with  an  MRF  prior.  We  utilize  the  interpretation  of  the  smoothness  constraint  as  a 
“fractal  prior”  to  motivate  regularization  based  on  one  of  our  multiscale  models.  The  result 
is  a  slightly  different  prior  model,  which  yields  comparable  root-mean-square  (rms)  error 
performance  to  that  achieved  using  the  standard  MRF  prior  model,  but  with  drastically 
reduced  computational  load.  In  particular,  in  contrast  to  the  iterative  algorithms  needed 
to  solve  the  elliptic  equations  corresponding  to  the  MRF  prior,  our  multiscale  algorithm 
is  scale-recursive ,  yielding  the  optimal  estimate  in  a  finite  number  of  steps  with  constant 
per  pixel  computational  load.  Figure  1,  taken  from  [35],  is  representative  of  the  results. 
Here,  the  rms  error  is  plotted  for  our  multiscale  regularization  (MR)  algorithm  and  versus 
iteration  for  two  iterative  methods  for  solving  the  MRF-based  estimation  problem.  Since 
the  multiscale  method  is  not  iterative,  its  rms  performance  is  plotted  as  a  horizontal  line. 
The  entire  multiscale  algorithm  has  a  computational  load  roughly  equal  to  4.2  iterations 
of  the  iterative  successive  over-relaxation  (SOR)  algorithm,  indicating  a  considerable  com¬ 
putational  savings.  Moreover,  not  only  does  this  computational  savings  grow  with  image 
size  (because  of  the  constant  per  pixel  complexity  of  our  approach)  but  also  this  algorithm 
yields  error  covariance  information  as  part  of  its  computation,  something  that  is  not  feasible 
in  the  smoothness  constraint  formulation  and  that  can  be  used  to  determine  the  optimal 
resolution  for  flow  estimation  at  each  point  in  the  image  frame  (see  [35]).  Based  on  this 
evidence  of  its  promise  in  practice,  it  is  natural,  then,  to  ask  the  question  of  how  rich  a  class 
of  phenomena  can  this  multiscale  formalism  capture?  The  answer  provided  in  this  paper  is 
that  this  class  is  extremely  rich  indeed. 

The  multiscale  representations  developed  here  rely  on  a  generalization  of  the  mid-point 
deflection  technique  for  constructing  a  Brownian  motion  in  one  dimension  [20,  24,  33].  To 
construct  a  Brownian  motion  sample  path  over  an  interval  by  mid-point  deflection,  we  start 
by  randomly  choosing  values  for  the  process  at  the  mid-point  and  the  two  boundary  points  of 
the  interval  according  to  the  joint  probability  distribution  implied  by  the  Brownian  motion 
model.  We  then  use  these  three  values  to  compute  the  expected  values  of  the  Brownian 
motion  at  the  one-fourth  and  three-fourths  points  of  the  interval.  The  expected  value  at 
the  one-fourth  (three-fourths)  point  corresponds  to  the  average  of  the  initial  and  mid-point 
values  (mid-point  and  final  values)  as  shown  in  the  upper  left  of  Figure  2.  Random  values, 
with  appropriate  error  variances,  are  then  added  to  the  predictions  at  each  of  these  new 
points.  The  critical  observation  to  be  made  here  is  that,  since  the  Brownian  motion  process 
is  a  Markov  process,  its  value  at  the  one-fourth  point,  given  the  values  at  the  initial  point 
and  mid-point  is  independent  of  the  process  values  beyond  the  mid-point,  in  particular  the 
values  at  the  three-fourths  and  end-points  of  the  interval.  Obviously,  it  is  also  the  case 
that  the  value  at  the  three-fourths  point  is  independent  of  the  values  at  the  initial  and 
one-fourth  points,  given  the  values  at  the  mid-point  and  final  point.  Consequently,  the 
random  deflection  terms  used  to  generate  the  values  of  the  Brownian  motion  at  the  one- 
fourth  and  three-fourths  points  can  be  chosen  independently.  In  addition,  we  see  that  the 
Markov  property  of  Brownian  motion  allows  us  to  iterate  this  process,  generating  values  at 
increasingly  dense  sets  of  dyadic  points  in  the  interval. 

There  are  several  important  observations  to  be  made  about  the  preceding  development. 
The  first  is  that,  by  linearly  interpolating  at  each  level  in  this  procedure,  as  illustrated  in 
Figure  2,  a  sequence  of  continuous  approximations  of  a  Brownian  motion  is  constructed,  and 
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the  statistics  of  these  approximations  converge  to  those  of  a  Brownian  motion  [20].  Indeed, 
this  sequence  of  linear  spline  approximations  can  be  interpreted  exactly  as  a  non-orthogonal 
multiscale  approximation  using  as  the  scaling  function  the  triangular  “hat”  function  [42] 
which  is  the  integral  of  the  Haar  wavelet  [24].  Second,  as  we  will  see,  the  structure  of 
this  mid-point  deflection  construction  fits  precisely  into  the  multiscale  modeling  framework 
developed  in  [15,  13,  14],  and  corresponds  simply  to  a  particular  choice  of  the  parameters 
in  the  multiscale  model.  Moreover,  this  concept  generalizes,  allowing  us  to  show  that  all 
1-D  reciprocal  and  Markov  processes  can  be  represented  by  multiscale  stochastic  models  in 
a  similar  way.  Thus,  in  one  dimension  we  will  show  that  the  class  of  processes  realizable  via 
multiscale,  scale-recursive  models  is  at  least  as  rich  as  the  class  of  all  Markov  and  reciprocal 
processes.  In  fact,  as  we  will  illustrate,  it  is  significantly  richer  than  this. 

Furthermore,  these  ideas  can  be  extended  to  multidimensional  processes.  In  particular, 
we  show  how  a  generalization  of  the  mid-point  deflection  concept  to  a  “mid-line”  deflection 
construction  can  be  used  to  represent  all  2-D  MRF’s  with  multiscale  models.  In  particular, 
the  key  to  our  multiscale  representations  in  one  or  two  dimensions  is  a  partitioning  of  the 
domain  over  which  the  process  is  defined  so  that  the  coarse-to-fine  construction  of  the 
process  can  proceed  independently  in  each  subdomain.  Markovianity  plus  knowledge  of  the 
process  on  the  boundaries  of  the  subdomain  partition  make  this  possible.  The  fundamental 
difference,  however,  between  the  1-D  and  2-D  cases  is  due  to  the  fact  that  boundaries  in 
V?  correspond  to  curves  or  in  Z2  to  sets  of  connected  lattice  sites,  as  opposed  to  pairs 
of  points  in  one  dimension.  Because  of  this  difference,  exact  multiscale  representations  of 
MRF’s  defined  over  a  subset  of  Z2  have  a  dimension  which  varies  from  scale  to  scale,  and 
which  depends  on  the  size  of  the  domain  over  which  the  MRF  is  defined. 

In  addition  to  the  exact  representations,  we  will  introduce  a  family  of  approximate 
representations  for  Gaussian  MRF’s  (GMRF’s)  based  on  wavelet  transforms.  As  we  have 
indicated,  maintaining  complete  knowledge  of  a  process  on  2-D  boundaries  leads  to  models 
of  scale-varying  dimension,  which  can  become  prohibitively  large  for  domains  of  substantial 
size.  On  the  other  hand,  at  coarser  scales,  it  would  seem  reasonable  to  keep  only  coarse 
approximations  to  these  boundary  values,  and  this  leads  naturally  to  the  use  of  a  multiscale 
change  of  basis  for  the  representation  of  the  values  of  a  2-D  process  along  each  1-D  boundary. 
That  is,  through  our  mid- line  deflection  based  models,  we  are  led  to  the  idea  of  using  one¬ 
dimensional  wavelet  transforms  in  the  representation  of  the  values  of  a  two-dimensional 
GMRF.  The  result  is  a  family  of  models,  ranging  from  those  w'hich  keep  only  the  coarsest 
wavelet  coefficients  along  each  1-D  boundary  to  the  exact  model  which  keeps  them  all.  This 
family  of  approximate  representations  allows  one  to  tradeoff  the  complexity  and  accuracy 
of  the  representations,  while  also  providing  a  framework  for  the  development  of  extremely 
efficient  estimation  and  likelihood  calculation  algorithms.  We  demonstrate  our  framework 
for  wavelet-based  approximate  representation  of  Gaussian  MRF’s  in  the  context  of  natural 
texture  representation  [10,  17,  16,  29,  37]. 

This  paper  is  organized  as  follows.  Section  2  describes  the  class  of  multiscale  stochastic 
models  that  we  use.  Section  3  develops  the  details  of  the  representation  of  Brownian  motion 
discussed  above,  and  generalizes  this  idea  to  allow  the  representation  of  all  1-D  Markov  and 
reciprocal  processes.  Section  4  then  describes  how  these  ideas  can  be  further  generalized 
to  provide  exact  and  approximate  representations  of  MRF’s.  Section  5  illustrates  how 
the  approximate  models  can  be  used  to  represent  GMRF  texture  models.  In  our  opinion, 
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one  of  the  conclusions  that  can  be  drawn  from  these  results  is  that  this  multiscale  modeling 
framework  holds  substantial  promise  as  an  alternative  to  the  MRF  framework  as  it  possesses 
advantages  both  in  terms  of  the  efficient  optimal  algorithms  it  leads  to  and  in  the  expressive 
power  it  holds.  Although  a  number  of  interesting  and  substantive  problems  remain  to  be 
investigated,  practical  applications  of  the  framework  are  already  emerging  and  several  of 
these,  as  well  as  the  conclusions  of  this  paper,  are  discussed  in  Section  6. 

2  Multiscale  Stochastic  Models 

In  this  section  we  describe  the  classes  of  multiscale  stochastic  models  to  be  used  in  this 
paper.  A  class  of  models  for  Gaussian  processes  is  described  first,  followed  by  a  general¬ 
ization  allowing  for  more  general  (non- Gaussian)  processes.  For  simplicity,  in  this  section 
we  introduce  the  basic  structure  and  form  of  our  models  in  the  context  of  representing  1-D 
signals  and  processes.  The  extension  of  the  models  to  2-D  is  conceptually  straightforward, 
adding  only  notational  and  graphical  complexity,  and  thus  we  defer  the  introduction  of  this 
extension  until  Section  4,  where  it  is  needed. 

2.1  Gaussian  Multiscale  Models 

The  models  presented  here  and  introduced  in  [15,  13,  14]  describe  multiscale  Gaussian 
stochastic  processes  indexed  by  nodes  on  the  dyadic  tree  in  Figure  3.  Different  levels  of  the 
tree  correspond  to  different  scales  of  the  process.  In  particular,  the  2m_1  values  at  the  mth 
level  of  the  tree  are  interpreted  as  information  about  the  mth  scale  of  the  process,  where  the 
notion  of  “information”  at  this  point  is  abstract.  For  instance,  values  of  the  process  at  level 
m  may  correspond  to  averages  of  pairs  of  values  at  level  m+ 1.  In  this  case,  one  can  interpret 
the  values  of  the  multiscale  process  as  scaling  coefficients  in  a  Haar  wavelet  representation  of 
the  process  at  the  finest  scale  [42],  However,  there  are  many  other  possible  interpretations  of 
the  information  represented  at  each  level  in  the  tree.  For  example,  values  of  the  multiscale 
process  at  a  certain  level  could  also  correspond  to  new  details  of  the  process  not  present 
at  coarser  resolutions.  In  this  case,  the  process  values  would  be  interpreted  as  the  wavelet 
coefficients  in  a  wavelet  representation  of  a  1-D  function  or  sequence.  Alternatively,  the 
values  at  different  levels  may  correspond  to  decimated  versions  of  the  process  at  the  finest 
scale.  As  we  will  see,  this  latter  interpretation  applies  to  our  multiscale  representations 
of  reciprocal  processes  and  MRF’s,  although  the  representations  can  also  be  interpreted  in 
terms  of  scaling  coefficients  corresponding  to  particular  non-orthogonal  expansions. 

We  denote  nodes  on  the  tree  with  an  abstract  index  s,  and  define  an  upward  shift 
operator  7  such  that  sj  is  the  parent  of  node  .s,  as  illustrated  in  Figure  3.  Also,  we  define 
the  scale  of  node  s,  i.e.  the  level  of  the  node,  as  m(s).  The  stochastic  tree  process  xs  £  TZn 
is  then  described  via  the  following  scale-recursive  dynamic  model: 

Xg  AgXgy  “j“  BgWg  (  1  ) 

under  the  assumptions6  x0  ~  A/^CFPo)  and  ws  ~  Af(0,I),  w'here  wa  €  TZm  and  As  and  Bs 
are  matrices  of  appropriate  size.  The  state  variable  x0  at  the  root  node  of  the  tree  provides 

6The  notation  x  ~  means  the  random  vector  x  is  normally  distributed  with  mean  vector  m  and 

covariance  matrix  P. 
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an  initial  condition  for  the  recursion.  The  driving  noise  w,  is  white  and  is  independent  of 
the  initial  condition.  Interpreting  each  level  as  a  representation  of  one  scale  of  the  process, 
we  see  that  (1)  describes  the  evolution  of  the  process  from  coarse  to  fine  scales.  The  term 
AsxSj  represents  interpolation  or  prediction  down  to  the  next  level,  and  Bsws  represents 
new  information  added  as  the  process  evolves  from  one  scale  to  the  next.  The  choice  of 
the  parameters  As  and  Bs  and  their  dependence  (if  any)  on  the  node  s,  depends  upon  the 
particular  application  and  process  being  modeled  [15,  13,  14,  35].  In  the  context  of  this 
paper,  as  we  will  see,  the  parameters  of  the  model  (1)  are  determined  in  a  constructive 
fashion  in  order  to  represent  the  reciprocal  process  or  MRF  of  interest. 

Note  that  any  given  node  on  the  dyadic  tree  can  be  viewed  as  a  boundary  between  three 
subsets  of  nodes  (two  corresponding  to  paths  leading  towards  offspring  and  one  correspond¬ 
ing  to  the  path  leading  to  a  parent).  An  extremely  important  property  of  the  scale-recursive 
model  (1)  is  that  not  only  is  it  Markov  from  scale-to-scale,  but,  conditioned  on  the  value  of 
the  state  at  any  node,  the  values  of  the  states  defined  at  the  corresponding  three  subsets 
of  nodes  are  independent.  This  fact  implies  that  there  are  extremely  efficient  and  highly 
parallelizable  algorithms  for  optimal  estimation  and  likelihood  calculation  based  on  noisy 
measurements  y„  £  1ZP  of  the  process  of  the  form: 

Vs  —  1  (2) 

where  vs  ~  Af (0,  Rs)  and  the  matrix  Cs  can  specify,  in  a  very  general  way,  measurements 
taken  at  different  times  or  spatial  locations  and  at  different  scales  [2,  15,  13,  14,  34].  For 
example,  as  mentioned  in  the  Introduction,  the  extension  of  one  of  the  optimal  estimation 
algorithms  to  2-D  and  quadtrees  is  applied  in  [35]  to  develop  a  new  scale-recursive  approach 
to  dense  motion-field  estimation  in  image  sequences  that  is  considerably  faster  than  previ¬ 
ously  developed  algorithms.  In  addition,  the  likelihood  calculation  algorithm  can  be  used, 
together  with  the  results  presented  here,  for  texture  identification  [34,  36].  An  important 
point  about  these  algorithms,  which  is  of  particular  significance  for  2-D  processing,  is  that 
they  are  recursive  and  not  iterative,  and  in  fact  have  constant  complexity  per  data  point 
or  pixel.  This  is  in  sharp  contrast  to  the  usual  iterative  algorithms  associated  with  the 
processing  of  MRF’s  [25]. 


2.2  General  Multiscale  Models 


As  we  indicated  in  the  preceding  section,  a  basic  property  of  the  model  (1)  is  the  Marko- 
vianity  of  the  state  with  respect  to  the  ordering  structure  defined  by  the  dyadic  tree.  More 
precisely,  let  T*,i  =  1,2,3  denote  the  three  subsets  of  states  which  correspond  to  viewing 
node  s  as  a  boundary  between  the  three  subsets  of  nodes  corresponding  to  paths  leading 
towards  the  parent  and  two  offspring  nodes7.  Then, 


Pvl,vJ,v5l-.(Ti,Tj,T;|X.)  =  p„j|a,j(Tj|A3K,|^(T2|As)pu3|a!a(T^Y3) 


(3) 


By  requiring  only  this  property  to  hold,  we  obtain  a  much  wider  class  of  processes  than 
that  given  by  (1),  but  still  retain  the  essential  properties  leading  to  the  efficient  algorithms 

7  We  stress  the  difference  here  between  subsets  of  nodes  (e.g.  {si ,  S2 ,  •  ■  -})  subsets  of  states  (e.g. 
{®»i ,  x,,,  ■  •  •}). 
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mentioned  above.  In  particular,  the  property  (3)  not  only  implies  that  the  tree  processes 
are  Markov  in  scale,  from  coarse-to-fine,  but  also  that  the  conditional  pdf  of  the  state  at 
node  s,  given  the  states  at  all  previous  scales,  depends  only  on  the  state  at  the  parent  node 
sj: 


Px,\x0}m(<r)<m(s){X 3 \Xtri  Ui(<t)  <  m(s))  —  Px,\x,^  (A» l-^Vy)  (^) 

Such  tree  processes  are  naturally  defined  by  specifying  the  parent-offspring  conditional 
pdf’s,  along  with  a  pdf  for  the  state  at  the  root  node  of  the  tree.  A  simple  example  of 
a  stochastic  process  in  this  general  class  is  the  following  discrete-state  stochastic  process 
xs  €  {0, 1,  ■  •  • ,  L}  with  parent-offspring  conditional  probability  mass  functions  given  by: 


Px,  |  (XS|X^) 


0m(s)  if  A~s  — 

(1  -em{s))/L  if  Xs  7^  XSy 


(5) 


where  p^Xo)  =  l/(L  +  1)  for  X0  6  {0, 1,  •  •  • ,  L}  and  9m(„)  is  a  number  between  0  and  1 
which  may  vary  with  scale  m(s).  A  class  of  processes  with  this  structure  and  defined  on  a 
quadtree  has  been  proposed  by  Bouman  for  segmentation  applications  [8,  9]. 

Finally,  we  stress  that  while  (3)  implies  that  a  tree  process  is  Markov  in  scale,  the  set 
of  states  xm  at  scale  m,  viewed  as  a  sequence  of  length  2m~'1  is  not  Markov  for  an  arbi¬ 
trarily  chosen  set  of  parent-offspring  pdf’s.  This  point  can  be  appreciated  by,  for  example, 
computing  the  joint  pdf  for  the  four  values  at  the  third  level  of  the  multiscale  process  given 
by  (5),  and  directly  checking  the  conditions  required  for  Markovianity  of  the  single  level 
sequence8.  However,  as  we  show  in  the  next  section,  the  parent-offspring  conditional  pdf’s 
can  be  chosen  such  that  the  finest  level  of  the  tree  process  can  be  used  to  represent  any  1-D 
Markov  or  reciprocal  process,  with  higher  levels  in  the  tree  corresponding  to  representations 
of  the  process  at  coarser  resolutions. 


3  Representation  of  1-D  Reciprocal  and  Markov  Processes 

In  this  section  we  describe  the  basic  properties  of  reciprocal  processes  in  one  dimension,  in¬ 
troduce  and  develop  representations  of  reciprocal  processes  in  terms  of  multiscale  stochastic 
models,  and  present  several  examples. 

3.1  1-D  Reciprocal  Processes 

A  reciprocal  process  is  a  first-order  MRF  on  the  real  line.  More  precisely,  a  stochastic  process 
zt,t  €  71  is  reciprocal9  if  it  has  the  property  that  the  conditional  probability  distribution 
of  a  state  in  any  open  interval  (Tx,T2),  conditioned  on  the  states  outside  of  this  interval, 
depends  only  on  the  boundary  states  zj1 ,  zx2  [22,  32].  That  is,  for  t  £  (Ti,!^): 

Pzt\zT,re{T1,T2)c(Zt\Zr,T  e  (Ti,T2)c)  =  Pzt\zTl,zT2(Zt\ZTnZT2)  (6) 

®The  process  is  Markov  only  if  0TO(,)  =  1/(1  +1).  In  this  case,  the  values  of  the  process  at  any  level  in 
the  tree  are  independent  of  one  another. 

9The  discussion  here  refers  only  to  first- order  reciprocal  processes.  Extension  to  higher-order  processes 
is  straightforward  [22]. 
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where  ( T\,T2)C  denotes  the  complement  of  the  open  interval  (Ti,T2).  Reciprocal  processes 
defined  on  the  integers  Z  satisfy  the  same  property  with  the  continuous  interval  (Ti,T2) 
replaced  by  the  discrete  interval  {7i  +  1,  T\  +  2,  •  •  •  rT2  —  1} 

Reciprocal  processes  are  closely  related  to  the  class  of  Markov  processes.  A  process  zt  on 
1Z  or  Z  is  Markov  if  past  and  future  values  of  the  state  are  independent  given  the  present. 
This  means  that  for  t2  <  t2: 


Pz,3 |zn , h <t2 (^t3 1 <  t2)  -  P^t2(Zt3\Zt2)  (7) 

As  discussed  in  [1,  22],  if  a  process  is  Markov  then  it  is  also  reciprocal,  whereas  reciprocal 
processes  are  not  necessarily  Markov. 

3.2  Exact  Multiscale  Representations  of  1-D  Reciprocal  Processes 

In  the  introduction  we  described  a  construction  of  a  Brownian  motion  bt  over  the  unit 
interval  via  mid-point  deflection.  As  we  noted,  this  corresponds  precisely  to  one  of  the 
Gaussian  multiscale  stochastic  models  described  in  Section  2.  To  see  this,  consider  the 
following  process.  At  the  coarsest  level,  the  initial  state  x0  is  a  three-dimensional  vector 
whose  pdf  is  given  by  the  joint  pdf  for  the  values  of  a  Brownian  motion  at  the  initial,  middle 
and  final  points  of  the  interval: 


x0  = 


Po  = 


bo 
bo. 5 
61 

0  0 
0  0.5 
0  0.5 


~  A/”(0,  Po) 

0 

0.5 

1 


(8) 

(9) 


where  we  have  used  the  facts  that  b0  =  0,  bt  is  an  independent  increments  process,  and  for 
t\  <  t2>  bt2  -  btl  ~  Af(Q,t2  —  ti). 

Choosing  a  value  for  sc0  as  a  sample  from  this  distribution  corresponds  to  the  first  step 
in  the  mid-point  deflection  construction  of  Brownian  motion.  The  second  step  in  the  mid¬ 
point  deflection  construction  is  the  specification  of  values  for  the  Brownian  motion  at  the 
one-fourth  and  three-fourths  points.  In  the  context  of  our  multiscale  modeling  framework, 
we  define  two  state  vectors  at  the  second  level  of  the  dyadic  tree  in  Figure  3,  each  again  a 
3-tuple.  The  state  on  the  left  represents  the  values  of  the  Brownian  motion  at  the  initial, 
one-fourth  and  middle  points  of  the  interval,  [bo,  bo.25,  60.5])  and  the  state  on  the  right 
represents  the  corresponding  values  in  the  right  half-interval,  [60.5,  60.75,  61].  The  sample  at 
the  quarter  point  is  given  by  linear  interpolation  of  bo  and  bo, 5,  plus  a  Gaussian  random 
variable  with  variance  equal  to  the  variance  of  the  error  in  this  prediction: 


&0.25  —  2^0  T  ^O.b)  T  e0.25j  Co. 25  ~  -A/* (0 , 0.125)  (10) 

Likewise,  60.75  is  chosen  by  averaging  the  end  points  of  the  right  half- interval,  60,5  and  6j, 
and  adding  in  a  random  value,  independent  of,  and  identically  distributed  to,  the  deflection 
term  used  to  create  the  sample  at  the  one-fourth  point. 
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The  construction  of  &0.25  and  £>0,75  in  the  multiscale  model  is  precisely  the  same  as  the 
mid-point  deflection  construction  of  these  values.  Values  of  the  process  at  successively 
finer  sets  of  dyadic  points  are  generated  in  the  same  way.  At  the  mth  scale,  the  values  of 
the  process  at  t  —  k/2m,k  =  0,l,--',2m  are  represented  with  2m~ 1  state  vectors,  each 
containing  the  values  of  the  process  at  three  points,  as  shown  in  Figure  4.  At  any  level, 
each  state  is  a  linear  function  of  its  parent,  plus  an  independent  noise  term.  Thus,  this 
construction  fits  precisely  into  the  multiscale  modeling  framework  given  by  (1)  (see  Section 
3.3  for  the  precise  formulae  for  As  and  Ba). 

Representation  of  more  general  1-D  reciprocal  processes  via  multiscale  models  is  a  simple 
extension  of  the  above  idea.  To  construct  a  multiscale  model  for  a  particular  reciprocal  pro¬ 
cess  zt,  t  6  [0, 1],  start  by  choosing  the  state  at  the  coarsest  level  as  a  sample  from  the  joint 
distribution  PzQ,z0.Blz1(Zo,  Z0.5,  Z\).  This  generalizes  the  choice  in  the  construction  above 
in  which  the  state  at  the  top  level  is  chosen  using  the  Gaussian  distribution  corresponding 
to  a  Brownian  motion.  The  two  state  vectors  at  the  second  level  are  again  the  three- 
dimensional  vectors  [zo,  20.25  j  20.5]  and  [z0.5,  2o.75>  -i],  where  values  for  the  half-interval  mid¬ 
points  are  chosen  as  samples  from  the  conditional  distributions  pZo  2&\Z0iZ0  b(Zo.2s\Zo,  Z0.5) 
and  pZo  76u0  6|Zl (^0.75 \Zo.s j  respectively.  Since  the' process  is  reciprocal,  20.25  and  20.75 

are  conditionally  independent  given  the  state  at  the  first  level,  and  thus  the  modeling  struc¬ 
ture  fits  precisely  into  the  more  general  non-linear  model  class  described  in  Section  2.2, 

The  construction  above  assumes  that  the  process  is  defined  over  a  continuous  interval.  In 
practice,  we  are  typically  concerned  with  processes  zt  on  a  discrete  interval,  t  E  {0, 1,  •  •  • ,  T}. 
If  T  —  2n  for  some  integer  N ,  then  we  can  use  essentially  the  same  construction  as  for  the 
continuous  case  above.  Specifically,  Xq  =  [z0,  2jy2,  zt]  is  a  random  vector  chosen  from 
the  appropriate  distribution  for  the  process  of  interest.  The  states  at  the  second  level  are 
[zo,  zT/4,  zr/i]  and  [zjy2,  233-/4.  zp],  with  the  half- interval  mid-points  again  chosen  using  the 
appropriate  distribution.  Since  there  are  only  a  finite  number  of  points  in  the  discrete 
process,  only  a  finite  number  of  levels  are  needed  to  exactly  represent  it.  In  particular,  with 
T  —  2N ,  N  levels  are  required. 

There  are  several  observations  to  be  made  about  the  continuous  and  discrete-time  con¬ 
struction  we  have  just  described.  The  first  is  that  there  is  no  fundamental  difficulty  in 
choosing  a  point  other  than  the  mid-point  at  each  level  in  these  constructions.  For  example, 
in  the  construction  of  Brownian  motion,  starting  from  the  initial  set  of  points  represented 
in  the  root  node  state,  we  could  next  generate  any  pair  of  points  on  either  side  of  0.5,  e.g. 
&o.i  and  &o.7-  However,  the  regular  structure  implied  by  the  choice  of  mid-points  may  be 
of  some  value  for  processes  such  as  Brownian  motion  which  have  stationary  increments,  as 
they  lead  to  models  in  which  the  model  parameters,  such  As  and  Bs  in  (1),  have  very  simple 
and  regular  characterizations  as  a  function  of  node  s  and  scale  m(s).  This  regularity  in  turn 
leads  to  simplifications  in  the  structure  of  algorithms  for  estimation  and  signal  processing, 
requiring  fewer  distinct  gains  to  be  calculated  and,  if  parallel  implementation  is  considered, 
allowing  SIMD  (single  instruction,  multiple  data)  rather  than  MIMD  (multiple  instruction, 
multiple  data)  implementations. 

Secondly,  in  discrete-time,  there  will  always  be  at  least  some  degree  of  irregularity  in 
the  multiscale  model  if  the  process  is  defined  over  t  6  {0, 1,  •  •  • ,  T}  and  T  is  not  a  power  of 
two.  In  particular,  in  such  a  case  the  structure  of  the  tree  and/or  the  state  needed  in  the 
multiscale  representation  of  this  process  will  need  to  be  modified.  For  example,  consider 
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a  process  defined  over  t  £  {0, 1,  •  •  •,  10}.  In  this  case,  we  can  develop  a  model  of  the  type 
we  have  described  in  which  the  tree  is  of  non-uniform  depth  and  in  which  we  do  not  have 
mid-point  deflection  at  some  nodes,  as  indicated  in  Figure  5a  (e.g.  in  the  generation  of  the 
value  at  t  =  3  given  values  at  0  and  5).  Alternatively,  as  shown  in  Figure  5b,  we  may  be 
able  to  achieve  some  level  of  (and  perhaps  complete)  symmetry  by  generating  more  than 
one  new  point  at  some  nodes  (e.g.  in  Figure  5b  we  generate  values  at  both  t  —  2  and  t  —  3 
given  values  at  0  and  5). 

Furthermore,  as  we  have  indicated  previously,  while  our  development  has  focused  on 
first-order  reciprocal  processes,  the  extension  to  higher-order  models  is  straightforward. 
Indeed,  a  Kth-ordeT  model  defined  on  t  £  {1, 2,  •  •  • ,  K(T  +  1)},  where  T  is  a  power  of  2,  can 
be  accommodated  by  grouping  states  at  adjacent  points  into  sets  of  size  K .  Higher-order 
models  can  equivalently  be  represented  by  simply  redefining  the  state  of  the  process  zt  to 
be  a  vector  of  appropriate  dimension. 

The  representations  we  have  introduced  to  this  point  have  obvious  and  substantial  levels 
of  redundancy.  For  example,  the  value  of  zT/2  appears  in  the  state  vector  at  both  nodes 
at  the  second  level  of  the  multiscale  model  we  have  described  for  discrete-time  reciprocal 
processes.  More  generally,  at  the  mth  level  of  the  model  for  such  a  process  there  are  2m~1 
state  vectors  containing  a  total  of  3  X  2m~1  values,  only  2m  +  1  of  which  are  distinct.  This 
redundancy  is  actually  of  minimal  consequence  for  estimation  and  likelihood  calculation 
algorithms  based  on  these  models.  However,  it  is  also  easy  to  eliminate  the  redundancy 
using  a  simple  modification  of  the  construction  we  have  described.  In  particular,  we  may 
generate  two  internal  points  between  each  pair  of  points  at  each  stage  in  the  level-to-level 
recursion,  yielding  a  four-dimensional  state  vector.  For  example,  if  the  reciprocal  process 
is  defined  over  t  £  {1,  2,  •  •  • ,  16},  then  we  can  choose  the  non-redundant  set  of  state  vectors 
illustrated  in  Figure  6.  In  this  case,  a  first-order  reciprocal  process  is  represented  by  a 
process  with  a  four- dimensional  state.  In  general,  at  the  mth  level  of  such  a  representation, 
there  are  2m_1  state  vectors  representing  2m+1  distinct  values  of  the  process.  Again,  in 
the  situation  where  T  is  not  a  power  of  two,  some  irregularity  in  the  structure  will  be 
introduced. 

Once  we  allow  ourselves  to  consider  such  variants  on  the  original  mid-point  deflection 
construction  in  which  more  than  one  new  point  is  generated  between  each  pair  of  previously 
constructed  points,  we  see  immediately  that  it  is  possible  to  generate  multiscale  represen¬ 
tations  on  trees  that  are  not  dyadic.  For  example,  consider  a  reciprocal  process  defined  on 
t  £  {0, 1,  •  •  • ,  3^}.  This  process  is  most  conveniently  represented  on  the  regular  structure 
of  a  third-order  tree,  as  shown  in  Figure  7.  This  flexibility  of  the  modeling  framework  al¬ 
lows  the  possibility  of  considering  different  tradeoffs  in  terms  of  level  of  parallelization  and 
computational  power  of  individual  processors  when  implementing  estimation  and  likelihood 
calculation  algorithms  such  as  those  in  [2,  15,  13,  14,  35,  34,  36]. 

Finally,  it  is  of  interest  to  note  that  the  construction  we  have  described,  and  its  several 
variants,  can  be  interpreted  as  a  non-iterative  Gibbs  sampler.  The  Gibbs  sampler  intro¬ 
duced  in  [25]  is  an  iterative  algorithm  for  the  generation  of  sample  paths  of  MRF’s  on  a 
discrete  lattice.  For  1-D  discrete-time  reciprocal  processes  this  procedure  reduces  to  using 
the  nearest  neighbor  conditional  probability  functions  to  construct  a  Markov  chain  which 
has  an  asymptotic  distribution  equal  to  the  correct  distribution  of  the  process.  Specifically, 
at  each  step  of  the  procedure  we  modify  the  current  sample  path  by  replacing  the  value  at 
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some  point  in  time,  say  t0  with  a  random  value  chosen  according  to  the  conditional  distri¬ 
bution  for  the  process  at  that  point  given  the  current  values  of  the  sample  path  at  t0  —  1  and 
<o  +  l.  By  cycling  repeatedly  through  all  of  the  points,  the  sample  path  is  guaranteed  to 
converge  to  one  with  the  correct  statistics.  The  procedure  is  conceptually  simple  but  com¬ 
putationally  intensive,  since  the  Markov  chain  requires  many  transitions  for  the  probability 
function  to  converge.  In  contrast,  in  our  construction,  we  successively  generate  samples  at 
new  points  (e.g.  mid-points)  conditioned  on  values  at  previously  generated  points,  which 
are  not  nearest  neighbors  but  rather  boundary  points  that  partition  the  time  interval  of 
interest.  For  this  reason,  and  since  we  begin  at  the  root  node  with  a  decimated  set  of  values 
with  the  correct  distribution,  we  are  guaranteed  that  at  each  stage  the  decimated  process 
that  is  constructed  has  exactly  the  correct  distribution.  Thus,  with  this  structure  we  visit 
each  time  point  only  once  and  construct  a  sample  path  non-iteratively. 

In  fairness,  an  important  point  to  note  here  is  that  if  a  reciprocal  process  is  specified 
directly  in  terms  of  a  Gibbs  distribution  then  the  calculation  of  the  nearest  neighbor  pdf’s 
required  in  the  Gibbs  sampler  is  simple  [25].  The  question  then  is  whether  it  is  also  simple 
to  determine  the  conditional  pdf’s  —  e.g.  the  pdf  for  zj y2  given  z0  and  zt  —  needed  to 
implement  the  non-iterative,  multiscale  procedures  we  have  described.  As  we  have  seen  for 
Brownian  motion  and  as  we  illustrate  further  in  the  examples  below,  in  many  cases,  includ¬ 
ing  all  vector  Gauss-Markov  processes  and  Z-state  Markov  chains,  closed  form  expressions 
can  be  derived  for  the  multiscale  representations.  Further  1-D  examples  corresponding  to 
the  Ising  model  are  discussed  in  [34]. 

3.3  Examples 

In  this  section  we  discuss  several  examples  of  reciprocal  processes  and  their  multiscale  rep¬ 
resentations.  The  first  examples  describe  multiresolution  models  for  general  vector  Gauss- 
Markov  processes  specified  in  state-space  form  and  allow  us  to  illustrate  the  interpretation  of 
these  multiresolution  models  as  providing  approximations  based  on  non- orthogonal  expan¬ 
sions.  In  particular,  our  model  for  Brownian  motion  corresponds  to  the  use  of  the  so-called 
“hat”  function  [42]  in  this  expansion,  leading  to  linear  interpolation  between  dyadic  points, 
while  a  model  for  the  integral  of  Brownian  motion  leads  naturally  to  a  multiresolution 
approximation  using  cubic  interpolation. 

The  second  part  of  this  section  presents  several  discrete-state  examples,  the  first  of 
which  investigates  general  Z-state  Markov  chains  and  allows  us  to  make  contact  with  the 
models  used  in  [8,  9]  for  segmentation  applications.  The  second  example  is  a  general  two- 
state  process,  which  is  used  to  demonstrate  that  the  class  of  multiscale  models  is  in  fact  far 
richer  than  the  class  of  Markov  processes. 

3.3.1  Gauss-Markov  Processes 

Consider  a  vector  Gauss-Markov  process  defined  on  the  interval  [0, 1]  and  given  by10: 

it  —  Ftzt  +  Gt^t  (11) 

10While  we  focus  here  on  the  construction  of  multiscale  models  for  continuous-time  Gauss-Markov  pro¬ 
cesses,  an  exactly  analogous  set  of  calculations  can  be  performed  for  the  discrete-time  process  zt+i  = 
Ftzt  +  GtUt 
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where  z0  ~  jV(0,  IIo),  E{^t/x^}  =  I6t-r  and  E{/^t2o’}  =  0.  Define  the  state  transition 
matrix  as  $(t,r)  and  state  covariance  matrix  as  IIt  =  E{ztztr}  [20].  Also,  let  zt2 |tlit3  denote 
the  conditional  expectation  zta  given  the  states  ztl  and  z<3,  and  Pt2|tljt3  the  corresponding 
covariance.  It  is  easy  to  show  that  for  t\  <  t2  <  t3: 


Zh\ti,t3 
^*2  1*1 1^3 


lit,  <&(t2,  h)T 

T 

ntl 

Hji^(^3j  h)T 

-i  - 

$(^3j  f2)ni2 

$(^3)  *i)iitl 

Ut3 

Zt3 


(12) 


nt 


nt1$(<2,fi)r 

r 

ntl 

ntl  $(<3,  tx)T 

-i 

lit,  §(t2,  h)T 

$(<3,f2)H<2 

$(t3,  tx)nt, 

n  ts 

$(t3,  t2)nt2 

(13) 


Using  (12)  and  (13),  we  can  obtain  explicit  formulae  for  the  parameters  A*,  Bt  and  Po 
in  the  multiscale  model  (1)  as  follows.  Let  us  identify  the  abstract  index  s  with  a  pair  of 
numbers  (m,  ip)  which  denote  the  scale  and  horizontal  shift  of  the  node  s,  respectively.  The 
horizontal  shift  <p,  running  from  0  to  2m_1  —  1,  indexes  the  nodes  at  scale  m.  For  instance, 
the  root  node  is  associated  with  the  pair  (1,0),  and  the  left  and  right  nodes  at  the  first  level 
are  associated  with  (2,0)  and  (2,1),  respectively.  With  this  notation,  the  state  at  node  s  on 
the  tree  contains  the  values  of  the  process  zt  at  the  particular  three  points: 


xs  —  x(m.,ip) 


z2Vj2m 

z(2<fi+l)/2m 

z(2<p+2)/2m 


(14) 


From  the  description  of  the  general  construction,  the  form  of  the  matrix  As  in  (1)  is  clear: 


I 

0 

0  ' 

Ki 

K2 

0 

0 

I 

0 

'  0 

I 

0 

0 

Ki 

k2 

0 

0 

I 

if  ip  is  even 


if  p  is  odd 


(15) 


In  particular,  if  ip  is  even,  then  the  first  and  third  components  of  the  state  x3  in  (14) 
correspond  to  the  first  and  second  components  of  xs ^ .  Thus,  the  identity  matrices  in  (15)  for 
ip  even  simply  map  the  first  and  second  components  of  xs ?  to  the  first  and  third  components 
of  xs.  In  addition,  the  mid-point  prediction  of  £(2<^+i)/2m  is  just  a  linear  function  of  the 
first  two  components  of  the  parent  xSj,  which  is  expressed  via  the  matrices  K i  and  K2  in 
the  second  row  of  (15).  The  matrix  As  for  ip  odd  is  similar,  and  in  fact  is  just  a  “shifted” 
version  of  As  for  ip  even  (reflecting  the  fact  that  the  interpolation  down  to  the  state  on  the 
right  depends  on  the  last  two  components  of  x„y). 

The  gain  matrices  in  (15)  can  be  computed  directly  from  (12).  Using  standard  formulae 
for  the  inversion  of  a  block  2x2  matrix,  we  compute: 

K\  —  $(t2,ti)  +  $(<2j u)nt, $(t3> u)T(nt3  -  $(^3,u)nt1$(f3)U)T) 
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-nt2$(t3,t2)T{nt3  -  ^(t3,t1)ntl^(t3,t1)T)  1#(«3, *i) 

%2  =  —^(*2)  il)IIi1$(i3,  ii)r(IIj3  -  $(f3,  ^l)ntj$(^3,  ti)T )  1 

+ni2$(f3,  f2)r(nf3  —  $(f3,i1)n4l$(f3;f1):r)  1 

where  h  =  2<p/2m,t2  =  (2 tp  +  l)/2m  and  t3  =  (2 ip  +  2)/2m. 

Likewise,  the  matrix  Bs  in  (1)  has  the  following  block  structure: 


Bs  =  B 


0 

K3 

0 


(16) 

(17) 


(18) 


where  K3  is  any  matrix  such  that  K3K£  =  Ptj|tl)t3  and,  again,  fj  =  2ip/2m,t2  =  (2y?+ 1 )/ 2”1 
and  t3  =  (2 <p  +  2)/2m.  The  matrix  B,  in  (18)  reflects  the  fact  that  no  noise  is  added  to 
the  first  and  third  components  of  the  state  xs,  (which  are  simply  copied  from  the  preceding 
level),  while  noise  corresponding  to  the  estimation  error  (13)  is  added  to  the  second. 

Finally,  the  initial  covariance  matrix  Pq  associated  with  the  root  node  state  is  given  by: 


Po 


Zo 

Zo 

T\ 

Z0.5 

Zo.5 

l 

Z 1 

Zl 

l 

n0  n0$(o.5,o)r  n0$(i,o)T 

$(o.5,o)n0  n0.5  n0.5$(i,o.5)r 

§(1, 0)IIo  $(1,  0.5)IIo.5  ITi 


(19) 

(20) 


For  instance,  if  zt  is  a  Brownian  motion,  then  Ft  =  0,  $(f,r)  =  1,  II t  =  t  and: 


^t2\ti,t3 


h  -  t2  t2  -  t\ 

- - zt,  H - - 

is-  h  1  t3-t  i 

(^2  ~  ii)(^3  -  h) 

t3  —  ti 


(21) 

(22) 


Evaluating  these  at  t\  —  2ip/2m,t2  =  ( 2ip  +  l)/2m  and  t3  =  (2ip  +  2)/2m,  or  using  (16) 
-  (17),  we  have  Ki  =  K2  =  1/2.  Similarly,  from  (22)  and  (18),  K3  —  l/2^m+1^2.  The 
conditional  expectation  zt2 which  specifies  As  as  just  described,  also  provides  us  with 
the  required  formula  for  interpolating  between  dyadic  sample  points  at  any  level  in  our 
multiscale  representation  and  hence  we  can  interpret  this  representation  as  providing  a  se¬ 
quence  of  multiresolution  approximations.  For  example,  Brownian  motion  provides  us  with 
the  linear  interpolation  formula  given  in  (21)  and  illustrated  in  Figure  2.  This  corresponds 
to  a  multiresolution  linear  spline  approximation  or,  as  also  illustrated  in  Figure  2,  as  a 
non-orthogonal  multiresolution  decomposition  using  the  so-called  “hat”  function  [42]. 

As  a  second  example,  consider  the  movement  of  a  particle  whose  velocity  is  given  by  a 
Brownian  motion.  This  motion  can  be  described  using  the  following  Gauss-Markov  process: 


zt 


0  1 
0  0 


zt  + 


(23) 
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In  (23),  the  first  component  of  zt  is  the  particle  position  and  the  second  component  is  its 
velocity.  The  state  transition  matrix  §(f,r)  and  the  state  covariance  matrix  lit  follow  by 
straightforward  calculations.  Using  these,  one  can  show  that  the  terms  IItl  $(t2,  t\)T  and 
$(t3,  t2)ntj  in  the  leftmost  block  matrix  on  the  right  side  of  (12)  contain  only  cubic  powers 
of  £2.  Note  also  that  the  block  matrix  in  the  middle  of  the  right  side  does  not  depend  on 
f2.  Thus,  the  interpolation  of  zt2  between  and  t3  is  a  cubic  polynomial  in  t2: 

ci  +  c2t2  +  c3t|  +  c4t| 
c2  +  2  c3£2  -f  3  c$t\ 

where  from  (23),  the  second  component  of  zt2|tlit3  is  just  the  derivative  of  the  first.  It  is  clear 
from  the  definition  of  zt2  |il|t3  that  ztl  |tl|t3  =  ztl  and  =  zt3.  These  two  constraints 

provide  four  linear  equations  in  the  four  unknown  coefficients  in  (24),  and  thus  uniquely 
determine  the  interpolating  function  (24).  Note  that  the  interpolating  polynomial  for  the 
first  component  of  the  state  has  a  continuous  derivative  at  the  knot  locations  t  —  k/2m,  k  = 
0, 1,  •  •  • ,  2m.  The  interpolation  of  the  first  component  of  the  state  is  shown  in  Figure  8  for 
the  first  two  levels  of  a  sample  path  of  the  multiscale  realization. 


Zt2\tl,t3 


3.3.2  Discrete-State  Processes 


Next  consider  a  general  finite-state  Markov  process  zt£{l,2,--'Z}  defined  over  a  discrete 
interval  t  £  {0, 1,  •  •  • ,  T}.  The  probability  structure  of  the  process  is  completely  determined 
by  the  initial  condition  Pr[zo  =  &]  for  A:  £  {1,2,  ••■Z}  and  by  the  one-step  transition 
probabilities  Pij  =  Pr[,zt  =  i\zt-\  —  j].  We  define  the  one-step  transition  matrix: 


Pi,i  Pi, 2  ■  ■  •  P\,L 

Pi,l  P2,2  '  '  ■  P2,L 

Pl,  i  Pl,  2  ■  ■  ■  Pl,l 


(25) 


Note  that  the  multistep  transition  probabilities  are  given  by  powers  of  the  matrix11  P: 


pr  [zt+T  =  %\zt  =  j }  =  [ PT]i,j 


(26) 


Using  (26)  and  Bayes’  rule  it  is  straightforward  to  calculate  that  for  fi  <  t2  <  f3: 


Pr [ztj  =  j\zh  —  i,  zts  —  fc] 


[P^kjlP^ki 

[P^'Ki 


(27) 


These  conditional  probabilities,  in  addition  to  the  probability  function  required  for  the  state 
at  the  root  node  of  the  tree,  namely 


Pr[x0  =  i,  zT/ 2  =  j,  zT  =  k]  =  [PTl2}kij[PTl2}jiiPx[zo  =  z]  (28) 


allow  us  to  construct  the  multiscale  representation  of  the  process.  Note  that  (27)  is  the 
counterpart  of  the  conditional  probability  equations  for  Gauss-Markov  processes  given  in 

11[J4]ij  stands  for  the  (i,j)  element  of  the  matrix  A. 
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(12)  and  (13),  and  that  the  pdf  for  the  state  at  the  root  node  (28)  is  the  counterpart  of  the 
initial  covariance  matrix  (20). 

One  special  case  of  this  process  is  the  following: 


p. . 

1  1,1 

=  P 

(29) 

p.  ■ 

=  L-  1' 

(30) 

with  Pr[zo  =  i\  —  1  /L,i  —  1,2,  •••,£.  Neighboring  states  of  this  process  tend  to  be  the 
same,  and  when  the  process  does  change  state,  no  particular  change  is  preferred.  Thus,  this 
model  would  seem  to  be  a  natural  one  to  use  in  segmentation  applications  and  can  in  fact 
be  viewed  as  an  alternative  to  the  1-D  multiscale  model  (5)  introduced  in  [8,  9],  As  noted 
in  Section  2.2,  (5)  does  not  in  general  produce  a  Markov  chain  or  reciprocal  process  at  the 
finest  level.  On  the  other  hand  (29)  -  (30)  is  a  Markov  model,  with: 


[P%J 


ll  +  (L-l)$k)/L  if  i  =  j 
(1  -tik)/L  if  i  /  j 


(31) 


where  t?  =  ( Lfi  —  1  )/(L  —  1). 

Using  (27),  for  this  example  we  can  write  down  the  transition  probabilities  for  the  mid¬ 
point  deflection  model.  In  particular,  assuming  that  T  is  a  power  of  two,  we  can  associate 
the  state  at  node  s  with  the  following  values  of  the  process: 


xs  —  x(m,tp)  — 


z2<pT/2m 
z(2tfiT+T)/2m 
z(2tpT+2T)/2 m 


(32) 


where,  as  in  (14),  the  pair  of  numbers  ( m,<p )  denote  the  scale  and  horizontal  shift  of  the 
node  s,  respectively.  Thus,  to  generate  the  state  at  node  s,  given  the  state  at  the  parent 
node  37,  we  require  the  following  conditional  pdf: 

C1C1/C2  if  i  =  j  =  k 
66/6  if  i  ^  j  =  k 
66/6  if  *  =  (33) 

66/6  if  i=k^  j 
66/6  if  i,j,  k  distinct 


Prf^pT+T  =  j\Z2VT  =  iiZ2pT+2T  =  k]  —  < 
2m  2 171  2m  I 


where  for  ?  —  1,2,  ^  =  (1  +  (Z-  —  l)i?T/2m  ‘+1)/L  and  6  =  (1  -  t i1!2™  1+1)/L. 

To  gain  additional  insight  concerning  the  structure  of  our  multiscale  models,  consider 
the  particular  example  of  a  stationary  two-state  binary  process  with  one-step  transition 
matrix  and  initial  state  probabilities  equal  to: 


P  = 


1  ~  t*  V 
H  I-77 


For  this  process  one  can  show  that: 


pk 


1 

V  +  V 


r]  +  fi(l  -  rj  -  fi)k  J?  -  7(1  -  V  “  M)fc 
M  -  -  V  ~  V)k  n  +  Tj{l-rj-  p)k 


(34) 


(35) 
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and  thus  using  (35)  one  can  build  multiscale  representations  for  the  class  of  stationary 
binary  Markov  processes . 

Moreover,  the  mid-point  deflection  structure  can  also  be  used  to  generate  non-Markov 
processes  on  the  tree.  For  instance,  consider  the  following  binary  mid-point  “selection” 
process  defined  over  t  E  {0, 1,  •  •  -2^}  [43]: 

{H  if  i  -  j  =  k 

1  -  H  if  i  r  J  and  j  =  k  (36) 

0.5  if  j  ^  k 

with  Pr[zo  =  i,z2N-i  —  ],z2n  —  fc]  =  1/8  for  all  i,j,k  and  where  i,j,k  E  {1,2}  and  where 
ti,t3,t3  comprise  any  3-tuple  of  dyadic  points  corresponding  to  one  of  the  state  vectors  in  the 
multiscale  representation.  At  the  coarsest  scale  of  this  process,  the  three  components  of  the 
state  vector  Xq  are  independent  and  identically  distributed  random  variables,  each  equally 
likely  to  be  1  or  2.  It  is  easy  to  show  that  the  process  resulting  from  this  construction  is 
not  Markov  in  general,  and  thus  we  can  conclude  that  the  set  of  binary  stochastic  processes 
which  can  be  constructed  within  the  mid-point  deflection  framework  is  strictly  larger  than 
the  class  of  binary  Markov  processes  over  intervals. 

In  fact,  a  bit  of  thought  shows  that  the  class  of  processes  realizable  by  multiscale  mod¬ 
els  is  quite  a  bit  larger  than  the  class  of  Markov  chains.  Indeed,  any  binary  stochas¬ 
tic  process  defined  over  t  E  {0, 1,  •  •  • ,  2N}  when  represented  via  mid-point  deflection  has 
a  probability  structure  which  is  determined  by  4(2^  —  1)  parameters,  corresponding  to 
the  required  conditional  probability  functions.  In  particular,  the  conditional  probabilities 
Pr[zt2  =  i\ztl  =  j,ztl  =  k]  for  specific  choices  of  t\  <  t2  <  t2  are  uniquely  determined  by 
the  four  parameters  A ij,  ( i,j )  E  {(1, 1),  (1,2),  (2, 1),  (2, 2)},  where: 

Pr[zt2  =  l\ztl  =  i,zt3  =  j)  =  A  ij  (37) 

Since  the  process  is  represented  using  an  N  level  tree,  there  are  2^  —  2  of  these  conditional 
densities  which  must  be  specified,  corresponding  to  each  of  the  nodes  except  the  root  node. 
The  probability  function  for  the  state  at  the  root  node  requires  seven  parameters,  and 
thus  the  total  number  of  parameters  to  be  specified  is  4(2^  -  2)  +  7.  In  contrast,  a  non¬ 
stationary  binary  Markov  process  defined  over  the  time  interval  t  E  {0, 1,  •  •  • ,  2W}  requires 
at  most  1  +  2x2^  parameters  (one  corresponding  to  the  initial  probability,  and  2  for  each 
transition  from  t  to  t  +  1,  for  t  =  0, 1,  •  •  -2^  —  1).  Since  each  of  the  parameters  in  each  case 
is  a  probability,  i.e.  a  number  in  the  interval  [0,1],  we  see  that  the  set  of  processes  arising 
from  iV-level  multiscale  models  is  in  one-to-one  correspondence  with  the  (4(2^  —  2)  +  7)- 
dimensional  unit  cube,  while  the  set  of  non-stationary  Markov  chains  over  the  same  length 
interval  corresponds  to  the  (1  +  2  x  2^) -dimensional  unit  cube.  Thus,  for  N  >  1,  Markov 
processes  constitute  only  a  “thin”  subset  of  the  entire  class  of  binary  processes  constructed 
on  the  tree. 

4  Representation  of  2-D  Markov  Random  Fields 

In  this  section  we  first  review  a  few  of  the  properties  of  MRF’s  and  then  describe  how  they 
can  be  represented  exactly  using  multiscale  models.  We  then  use  these  exact  representations 
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to  motivate  a  family  of  approximate  representations  for  Gaussian  MRF’s  employing  1-D 
wavelet  transforms. 

4.1  2-D  Markov  Random  Fields 

Markov  random  fields  are  a  multidimensional  generalization  of  1-D  reciprocal  processes 
[4,  45,  29,  25,  22],  A  continuous  space  stochastic  process  zt,t  €  TZn  is  said  to  be  a  Markov 
random  field  if  the  conditional  probability  distribution  of  the  process  at  a  point  in  the 
interior12  Cl  \  T  of  a  closed  set  fi  with  boundary  T,  conditioned  on  the  values  of  the  process 
outside  of  0\T,  depends  only  on  the  values  of  the  process  on  the  boundary  set  T.  That  is, 
for  t  e  Cl  \  T: 


Pzt\zT,T£(o\r)c(Zt\ZT,T  e  (fi  \  T)c)  -  Pzt\zT,Tev(Zt\ZT,r  €  T)  (38) 

The  definition  for  MRF’s  on  discrete  lattices  requires  the  specification  of  the  notion  of  the 
“boundary”  of  a  set  in  Zn  [45,  22].  Typically,  this  is  accomplished  through  the  specification 
of  a  neighborhood  system.  The  idea  is  that  the  probability  distribution  of  zt,  conditioned 
on  the  values  of  the  process  on  the  rest  of  the  lattice,  depends  only  on  the  values  of  the 
process  in  the  neighborhood  of  t : 

Pzt\zTlTeZn\{t}{Zt\Zr,r  e  Zn  \{t})  =  pZt\Zr>TeDt(Zt\ZT,  r  £  Dt)  (39) 

In  this  paper,  we  focus  on  2-D  MRF’s,  i.e.  where  t  £  Z2,  and  in  this  context  there  is 
a  hierarchical  sequence  of  neighborhoods  frequently  used  in  image  processing  applications 
[10].  The  first  order  neighborhood  of  a  lattice  point  consists  of  its  four  nearest  neighbors 
(in  the  Manhattan  metric),  and  the  second-order  neighborhood  consists  of  its  eight  nearest 
neighbors.  A  given  neighborhood  system  implicitly  determines  the  boundary  set  of  any 
particular  region.  In  particular,  given  the  neighborhood  system  Dt,t  6  Z2,  the  boundary 
T  of  a  subset  Cl  of  Z2  is  given  by  the  set  of  points  which  are  neighbors  of  elements  in  f2, 
but  not  elements  of  Cl. 

4.2  Exact  Multiscale  Representations  of  2-D  Markov  Random  Fields 

The  representations  of  1-D  reciprocal  and  Markov  processes  in  Section  3  relied  on  the 
conditional  independence  of  regions  inside  and  outside  a  boundary  set,  and  we  use  the  same 
idea  here  to  represent  MRF’s.  The  multiscale  model  is  identical  to  that  used  in  the  1-D 
case,  except  that  it  is  defined  on  a  quadtree  instead  of  a  dyadic  tree.  That  is,  we  consider 
multiscale  models  in  which  s  denotes  a  node  on  the  quadtree  depicted  in  Figure  9  and  7 
is  a  four-to-one  operator,  i.e.  each  node  is  the  parent  of  four  descendant  nodes  at  the  next 
level. 

Consider  now  a  2-D  MRF  zt  defined  on  a  (2^  +  1)  x  (2^  +  1)  lattice.  The  construction 
of  reciprocal  processes  in  one-dimension  started  with  the  values  of  the  process  at  the  initial, 
middle  and  end  points  of  an  interval.  In  two  dimensions,  the  analogous  top  level  description 
consists  of  the  values  of  the  MRF  around  the  outer  boundary  of  the  lattice  and  along  the 
vertical  and  horizontal  “mid-lines”  which  divide  the  lattice  into  four  quadrants  of  equal  size. 

12The  notation  fi  \  T  denotes  the  set  of  elements  in  Q  which  are  not  in  F  (in  this  case,  the  interior  of  f2). 
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For  instance,  on  a  17  X 17  lattice,  the  state  vector  x0  at  the  root  node  of  the  quadtree  contains 
the  values  of  the  MRF  at  the  shaded  boundary  and  mid-line  points  shown  in  Figure  10.  The 
boundary  and  mid-line  points  are  denoted  with  O  and  o  symbols,  respectively.  In  general, 
the  state  at  the  root  node  is  a  (6  X  2^  —  3)-dimensional  vector  (given  some  ordering  of  the 
boundary  and  mid-line  lattice  points).  To  construct  a  sample  path  of  the  MRF,  we  begin  by 
choosing  a  sample  from  the  joint  pdf  of  the  MRF  values  defined  on  the  boundary  and  mid¬ 
line  set.  This  is  the  2-D  counterpart  to  choosing  a  sample  from  the  pdfp20iZ0  5>Zl  ( Zo ,  Z0.s ,  Z\) 
when  constructing  a  1-D  reciprocal  process. 

In  the  1-D  case,  transitions  from  the  first  to  second  level  consisted  of  obtaining  a  sample 
from  the  conditional  distribution  of  the  state  at  the  mid-points  of  the  left  and  right  half¬ 
intervals.  In  two  dimensions,  we  predict  the  set  of  values  at  the  mid-lines  in  each  of  the 
four  quadrants.  The  components  of  the  four  state  vectors  at  the  second  level  are  illustrated 
in  Figure  11  for  the  17  X  17  MRF.  The  points  corresponding  to  the  state  in  the  north-west 
corner  are  shaded,  and  correspond  to  a  scaled  and  shifted  version  of  the  points  at  the  top 
level.  The  boundary  points  of  the  north-west  state  are  denoted  with  open  and  blackened 
diamond  symbols  and  the  new  mid-line  points  are  denoted  with  open  circles.  Note  that  the 
four  states  at  the  second  level  share  the  black  diamond  mid-line  points  of  the  state  at  the 
first  level.  This  is  analogous  to  the  1-D  construction  in  which  the  mid-point  at  the  first  level 
corresponds  to  an  end  point  in  both  states  at  the  second  level  (cf.  Figure  4).  Each  of  the 
states  at  the  second  level  consists  of  points  carried  down  from  the  root  node  (namely  the 
diamond  boundaries  of  each  of  the  quadrants  in  Figure  11  as  well  as  new  mid-line  points 
within  each  quadrant  (the  open  circles  in  Figure  11).  These  mid-line  values  are  chosen  as 
samples  from  their  joint  conditional  distribution,  given  the  state  at  the  root  node.  The 
key  point  here  is  that  given  the  values  of  the  field  around  the  boundary  of  each  quadrant, 
the  values  of  the  field  along  the  mid-lines  of  that  quadrant  are  independent  of  the  values 
outside  this  quadrant.  Said  another  way,  the  four  states  at  the  second  level  of  the  tree  are 
conditionally  independent  given  the  values  of  the  MRF  on  their  respective  boundaries,  i.e. 
given  precisely  that  information  captured  in  the  state  at  the  first  level.  Thus,  the  values 
along  the  new  mid-lines  at  the  second  level  can  be  chosen  independently  and  in  parallel,  in 
analogy  to  the  way  the  two  mid-points  in  the  1-D  representations  are  chosen. 

Now,  we  can  iterate  the  construction  by  defining  the  states  at  successive  levels  to  be 
the  values  of  the  MRF  at  boundary  and  mid-line  points  of  successively  smaller  subregions. 
Indeed,  by  subdividing  each  quadrant  in  the  same  way  as  we  did  in  going  from  the  first  level 
to  the  second,  at  the  mth  level  the  4m_1  state  vectors  each  contain  the  values  of  the  MRF  at 
6  X  2iY~m4'1  -  3  boundary  and  mid-line  points.  Note  that  the  dimension  of  the  state  varies 
from  level  to  level,  reflecting  the  obvious  fact  that  the  number  of  points  in  the  boundary  of 
a  2-D  region  depends  on  the  size  of  the  region.  The  multiscale  representation  has  N  levels, 
and  each  of  the  AN~l  states  at  level  N  represent  9  values  in  a  3  X  3  square.  Because  of  the 
Markov  property,  at  each  level  the  states  are  conditionally  independent,  given  their  parent 
state  at  the  next  higher  level.  Thus,  the  MRF  can  be  thought  of  precisely  as  a  multiscale 
stochastic  process,  and,  in  the  Gaussian  case,  this  leads  to  models  exactly  as  in  (1). 

As  in  the  1-D  case,  there  are  several  comments  to  make.  First,  we  have  described  a 
construction  in  which  the  lattice  is  square.  If  the  MRF  is  defined  over  a  non-square  lattice, 
then  the  same  basic  approach  will  work.  In  particular,  all  we  require  is  some  sequence  of 
subregions  whose  boundaries  eventually  become  dense  in  the  set  of  lattice  points.  Likewise, 
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while  our  construction  applies  to  first  or  second  order  MRF’s,  higher-order  models  can  be 
represented  by  taking  as  state  the  values  of  the  process  along  boundaries  and  mid-lines 
of  “width”  greater  than  one.  Second,  just  as  our  1-D  multiscale  model  has  a  natural 
interpretation  in  terms  of  decimation  —  e.g.  if  the  points  on  the  finest  scale  correspond  to 
integers,  i.e.  to  Z ,  then  at  the  next  most  fine  scale  they  correspond  to  even  integers,  i.e. 
2 Z  —  so  does  our  2-D  model,  although  it  differs  from  the  usual  notion  of  decimation  in 
2-D.  Specifically,  if  the  points  on  the  finest  scale  correspond  to  Z2  =  Z  x  Z,  then  the  usual 
notion  of  decimation  would  be  2 Z  x  2 Z.  In  contrast,  the  notion  of  decimation  associated 
with  our  multiscale  models  yields  the  set  (2 Z  X  Z)  (J (Z  x  2 Z)  at  the  next  scale. 

Indeed,  the  obvious  difference  between  our  multiscale  MRF  representations  and  those  of 
[27,  26,  30]  is  that  these  latter  representations  do  correspond  to  multiscale  representations 
using  the  usual  notion  of  decimation.  That  is,  the  usual  decimation  leads  to  representa¬ 
tions  of  the  field  at  coarser  levels  which  correspond  roughly  to  2-D  lowpass  filtered  and 
subsampled  versions  of  that  at  the  finest  level.  Hence,  the  interpolating  functions  which 
generate  a  process  at  the  finest  level  from  a  coarse  scale  sample  naturally  correspond  to  2-D 
Haar  scaling  functions  or  more  generally  to  localized  interpolation  operators  such  as  those 
commonly  used  for  coarse-to-fine  grid  transfer  in  multigrid  applications.  In  contrast,  the 
interpolation  functions  in  our  representation  naturally  correspond,  in  the  case  of  Gaussian 
MRF’s,  to  the  solutions  of  specific  differential  (or  partial  differential)  equations  determined 
by  the  covariance  structure  of  the  process.  To  see  this  more  clearly,  note  that  the  linear 
spline  interpolation  formula  for  Brownian  motion  given  values  at  two  points  z0  =  Z0  and 
zt  =  Zt  is  simply  the  solution  to  the  second-order  differential  equation: 


d? 

dfiZt  l°’T 


0 


(40) 


Similarly  the  interpolation  of  the  first  component  of  the  second-order  process  (23)  corre¬ 
sponds  to  the  solution  of: 


^4^10,  T  -  0  (41) 

given  zo,z0,  zt  and  zt ■  The  2-D  example  analogous  to  the  linear  spline  model  for  Brownian 
motion  is  Laplace’s  equation  \j2z  —  0  given  values  of  z  on  the  boundary  of  a  square 
region,  while  the  counterpart  to  (41),  corresponding  to  a  second-order  model,  would  be  the 
solution  of  a  homogeneous  biharmonic  equation  y4i  —  0  given  boundary  values  and  normal 
derivatives  along  the  boundary  (see,  for  example  [44,  31],  for  related  discussions). 

Finally,  note  that  it  may  not  be  possible  to  explicitly  calculate  the  scale-to-scale  con¬ 
ditional  pdf’s  required  to  represent  an  MRF  which  is  specified  in  terms  of  local  (in  space) 
conditional  pdf’s.  Indeed,  even  if  this  were  possible  in  general,  it  is  unlikely  that  the  exact 
representations  of  MRF’s  would  lead  to  radically  more  efficient  algorithms  for  signal  process¬ 
ing,  since  in  this  case  the  scale-recursive  structure  comes  at  the  price  of  a  high  dimensional 
state.  Nevertheless,  these  exact  MRF  representations  provide  substantial  evidence  that 
the  multiscale  model  class  is  much  richer  than  its  simple  structure  suggests.  Moreover, 
as  we  show  in  the  next  section  in  the  context  of  Gaussian  MRF’s,  they  can  be  used  as  a 
guide  towards  other  far  more  parsimonious  multiscale  models  which  not  only  can  be  used 
to  represent  physical  processes  of  interest,  but  which  also  lead  to  efficient  algorithms. 
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4.3  Approximate  Multiscale  Representations  of  2-D  Gaussian  Markov 
Random  Fields 

In  this  section  we  propose  a  family  of  approximate  representations  for  Gaussian  MRF’s 
that  provide  low-dimensional  alternatives  to  exact  multiscale  models.  The  idea  behind  the 
approximate  representations  is  to  take  as  the  state  not  boundaries  of  regions,  but  rather 
some  reduced-order  representation  of  them.  Conceptually,  we  would  like  to  retain  only  those 
components  of  the  boundary  that  are  required  to  maintain  nearly  complete  conditional 
independence  of  regions.  In  general,  exact  conditional  independence  will  be  lost  unless  the 
entire  boundary  is  kept,  but  as  we  discuss  and  illustrate  here  and  in  the  next  section,  in 
many  cases  only  a  small  amount  of  information  needs  to  be  retained  in  order  to  obtain 
adequate  representations  of  the  important  statistical  and  qualitative  features  of  an  MRF. 

The  basis  for  our  approximation  methodology  is  a  change  of  coordinates  in  representing 
the  values  of  MRF’s  along  1-D  boundaries.  A  family  of  models  can  then  be  generated  by 
making  different  choices  for  the  set  of  coordinates  to  be  retained  and  those  to  be  discarded 
at  each  level  of  the  multiscale  representation.  These  models  range  from  being  exact  (if  all 
coordinates  are  retained)  to  increasingly  approximate  and  simple  as  fewer  and  fewer  coef¬ 
ficients  are  retained.  While  one  can  also  imagine  using  a  number  of  different  coordinate 
transformations,  such  as  1-D  Fourier  series  or  Karhunen-Loeve  expansions,  we  have  chosen 
here  to  make  a  choice  consistent  with  the  self-similar  structure  of  our  multiscale  represen¬ 
tations.  That  is,  we  will  use  the  1-D  wavelet  transform  to  represent  the  values  of  our  field 
along  1-D  boundaries. 

The  approximate  models  are  derived  from  a  class  of  non-redundant  exact  representations 
for  MRF’s  which  are  the  counterpart  of  those  illustrated  in  Figure  6  for  1-D  Markov  and 
reciprocal  processes.  In  particular,  the  states  at  the  first  and  second  levels  of  this  exact 
representation  for  an  MRF  defined  on  a  16  X  16  lattice  are  shown  in  Figures  12  and  13.  In 
a  multiscale  representation  of  an  MRF  defined  on  a  2N  x  2N  lattice,  a  state  at  the  mth  level 
represents  the  values  of  the  MRF  at  16(2iV-m  —  1)  points.  We  denote  this  set  of  points  as 
T„  and  we  view  it  as  the  union  of  four  mutually  exclusive  subsets.  In  particular,  consider 
the  112  points  associated  with  the  root  node  state  in  Figure  12.  We  can  view  these  as  four 
sets  of  28  points,  each  of  which  corresponds  to  the  boundary  of  one  8x8  quadrant.  In 
general,  we  can  divide  Ts  into  four  sets  of  4(2iV-rn(3)  -  1)  points  in  a  similar  fashion,  and 
we  denote  these  subsets  as  T„ti,i  £  {NW,  N E ,  SE,  SW},  where  the  subscripts  refer  to  the 
spatial  location  of  the  subset.  With  s  =  0  corresponding  to  the  root  node,  the  four  subsets 
To,;,  *  £  {NW,  NE,  SE,  5W}  are  illustrated  in  Figure  12  with  the  symbols: 


r0,jvTV  y,<],  [>,A,  and  combinations  of  these.  (42) 

ro,  ne  O  (43) 

To.SjE  D  (44) 

To.siv  <-*•  0  (45) 


Next,  we  interpret  the  set  of  values  {zt,t  £  rSij}  for  each  of  these  quadrant  boundaries, 
as  four  1-D  sequences  of  length  2N~m corresponding  to  each  of  the  sides  of  the  quadrant 
boundary.  Thus,  there  are  a  total  of  sixteen  1-D  boundary  sequences  associated  with  the  set 
r„  and  we  denote  these  as:  £  {NW,  N E,  SE,  SW},j  £  {hu,hl,vl,vr},  where  the 

latter  four  subscripts  refer  to  the  “horizontal,  upper”,  “horizontal,  lower”,  “vertical,  left” 
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and  “vertical,  right”,  respectively.  For  instance,  for  the  16  X  16  lattice,  the  sequences  0o,i,j 
are  shown  in  Figure  12.  Note  there  is  overlap  in  the  sequences  For  instance,  0o,NW,hu 

and  0o,NW,vi  both  contain  the  value  of  the  process  at  (0,0),  and  this  fact  is  reflected  in 
Figure  12  by  the  presence  of  both  V  and  t>  at  Ibis  lattice  point. 

Let  us  now  consider  the  simplest  of  our  approximate  models.  Specifically,  we  take  as 
the  state  of  the  approximate  representation  just  the  averages  of  the  sequences  03>ij.  The 
state  at  any  node  then  has  sixteen  components: 

xs,NW 

_  xs,NE 
xs  — 

xs,SE 
X3,SW 

where: 


IVo  S  3  ,i,vr 

Wof3s,iM 

Wofis,i,vl 


(47) 


for  i  E  { NW ,  NE,  SE ,  SW}  and  where  Wod3tij  denotes  the  average  of  the  sequence  03>ij{k). 
Given  the  definition  of  the  state  (46), (47)  (which  will  be  generalized  shortly  to  allow  general 
wavelet  transform  approximations  to  the  sequence  03,ij),  the  conditional  parent -offspring 
pdf’s  need  to  be  obtained  from  the  MRF  being  approximated.  Instead  of  using  these 
directly,  we  make  an  additional  approximation.  Let  us  define  the  downshift  operators 
Q-i,i  E  {NW,  NE,  SE,  SW},  which  are  the  counterparts  of  the  upshift  operator  7  de¬ 
fined  previously  (see  Figure  9).  In  particular,  we  denote  the  four  offspring  of  node  s  as 
sa.i,i  E  {NW,  N E,  SE,  SW},  where  the  subscript  refers  to  the  relative  spatial  location  of 
the  offspring.  In  the  exact ,  non-redundant  representations,  the  following  relationship  holds: 


Fzt,t€rjai|.zT,TGra('^f! ^  ^  Y s<Xi\Zr,T  6  F,)  - 

Pzt,terSai\zT,Terari(^t’^  ^  Taa;  \ZT,reYSii) 


for  i  E  {NW,  NE,  SE,  SW}.  What  (48)  says  is  that  the  conditional  pdf  for  the  state  at 
node  sa{  depends  only  on  a  subset  of  the  values  making  up  the  state  at  the  parent  node  s. 
For  example,  in  the  case  of  the  NW  offspring  of  node  s,  the  state  in  the  exact  representation 
at  node  sa^rw  (that  is,  zt,t  E  rsajvH,)  depends  only  on  the  NW  component  of  the  state  at 
node  s  (that  is,  on  the  values  zt,t  E  raijviv)-  Thus,  in  the  exact  representation  the  state  at 
node  sajyw  is  independent  of  the  values  of  the  MRF  at  the  points  in  TStNE,  Ys<se  and  F^sq/, 
given  the  values  at  In  contrast,  it  is  not  true  in  general  in  the  simple  approximate 

representations  just  described  that  the  state  xsaNW  is  independent  of  x3>ne,  %s,SE  and 
xg^sw ,  given  xSjnw-  That  is,  simply  knowing  the  average  value  of  a  process  along  each  side 
of  a  square  region  does  not  completely  decorrelate  the  values  of  the  field  inside  and  outside 
the  region.  Nevertheless,  in  our  approximate  modeling  framework  we  will  make  exactly  this 
assumption.  More  generally  and  precisely,  our  approximate  modeling  methodology  yields 
a  sequence  of  models  corresponding  to  differing  resolution  approximations  to  the  boundary 
processes  03tij{k),  where  (46)  -  (47)  corresponds  to  the  coarsest  of  these.  Using  the  same 
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symbols  xsi  and  xs  to  denote  the  state  components  and  state  of  any  of  these  models,  we 
construct  our  model  by  making  the  approximation  corresponding  to  assuming  that  the 
conditional  independence  property  holds,  i.e,  that: 


(X  SOLi  \Xs)  = 

'P&aa.i  |®3,t  (*  SCLi  \Xs,i) 


(49) 


Since  the  field  being  approximated  is  assumed  to  be  jointly  Gaussian,  the  conditional 
density  function  (49)  is  parameterized  by  conditional  means  and  covariances  as  in  (12)  and 
(13): 


Px, 


,  i{XSOLi\XS}i^  —  f\f{Xsai ,  xsai ,  PSOLi ) 


(50) 


where: 


xtai  —  Ei{xaai\xS}i}  (51) 

Paai  =  E{(a:«o:i  —  ®«a;)(:csai  ~~  ®sa;)  }  (52) 


One  can  then  derive  the  matrices  AS,BS  and  P0  in  the  multiscale  representation  of  the 
random  field: 


AsaNW  =  [Knw,  0, 0, 0]  (53) 

AsaNE  =  [0,  Kne,  0, 0]  (54) 

AsasE  =  [0,0,ifs£,0]  (55) 

Asasw  =  [0,0,0,ifsw]  (56) 

where: 

Ki  =  E{s,at.zJi}(E{*,iia£i})-1  (57) 


Likewise,  BsaiBja.  =  P„ai  and  P0  =  E{a:0a;o  }  The  assumption  (49)  is  directly  reflected 
in  (53)  -  (56).  In  particular,  the  state  xsai  is  a  function  only  of  the  ith  component  of 
the  parent  (cf.  (46)).  Thus,  the  assumption  in  (49)  leads  to  relatively  simple  level-to-level 
interpolations.  Indeed,  if  the  MRF  is  stationary ,  from  symmetry  we  see  that  not  only 
do  the  parameters  As,Be  depend  only  on  the  scale  of  node  s,  but  also,  K^w  =  Rjvb  = 
Kse  —  Ksw ■  Thus,  in  this  case,  the  representations  are  quite  parsimonious,  and  more 
importantly,  this  simple  structure,  in  addition  to  the  substantially  reduced  dimensionality 
of  the  approximate  representations,  leads  to  considerable  efficiencies  for  smoothing  [13,  14] 
and  likelihood  calculation  algorithms  [34,  36]. 

As  we  have  indicated,  the  generalization  of  the  coarsest  approximate  model,  with  state 
given  by  (46),  (47)  corresponds  to  using  wavelet  transforms  to  obtain  different  resolution 
representations  of  the  sequences  j3ttij(k).  We  utilize  the  wavelet  transform  for  discrete 
sequences  as  described  in  [6].  The  wavelet  transform  of  /33,i,j{k),k  €  (1,2,  ■  •  •,2iV-rTl(s)}  is 
a  set  consisting  of  a  single  “scaling”  coefficient  and  2Ar_mW  -  1  “wavelet”  coefficients13. 

13To  be  concrete,  we  assume  that  the  wavelet  transform  filter/downsample  operations  are  iterated  until 
the  sequence  of  scaling  coefficients,  i.e.  the  downsampled  output  of  the  lowpass  component  of  the  wavelet 
filter  bank,  is  of  length  one.  More  generally,  one  could  stop  at  any  point  in  the  decomposition. 
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These  are  computed  recursively  according  to14: 


ft1 

n—2M 

=  hnfn+ 2k-2 

Tl~  1 

(58) 

4_1 

n=2M 

—  X/  9nfn+2k-2 

n~  1 

(59) 

where  the  scaling  coefficients  and  wavelet  coefficients  are  fl  and  dJk  respectively,  hn,gn  are 
impulse  responses  of  quadrature  mirror  filters  [19]  of  length  2 M,  and  where  m(s)+1  = 
0S'ij(k).  We  say  that  a  pth-ordeT  representation  of  the  sequence  (3s,i,j{k)  is  a  set  consisting 
of  the  scaling  coefficient  and  the  wavelet  coefficients  up  to  order  p  in  the  wavelet  expansion, 
and  that  a  zeroth-order  representation  is  a  set  consisting  of  just  the  scaling  coefficient.  We 
denote  the  operator  which  maps  the  sequence  f33,i,j{k)  to  its  pt,l-order  representation  as 
Wp.  Note  that  if  p  —  N  —  m(s)  the  representation  is  complete,  since  it  contains  the  scaling 
coefficient  and  all  of  the  wavelet  coefficients.  For  p  >  N  -  m(s)  we  take  Wp  =  WN_m(s) 
(i.e.  if  there  are  fewer  than  p  scales  of  wavelet  coefficients,  we  keep  all  of  them). 

The  generalization  of  the  approximate  representation  based  on  averages  of  the  1-D 
sequences  /3Stij(k)  discussed  previously  now  just  involves  a  new  definition  for  the  state 
variables  xs.  In  particular,  we  simply  replace  (47)  with: 


^p0s,i,hu 

II  p/3S}i,v T 
Wp/^s,i,W 


(60) 


where  Wpf3s,i,j  denotes  the  pth- order  representation  of  the  sequence  f3s,i,j{k)  (a  vector  of 
length  2P  if  p  <  N  -  m(s)  and  of  length  2N~"m(s'>  if  p  >  N  -  m(s)).  Thus,  the  state  at  any 
given  node  consists  of  sixteen  components,  each  a  p^- order  representation  of  one  of  the  1-D 
boundary  sequences  /33iij(k)  associated  with  the  state  xs.  Using  this  generalized  definition 
for  the  state,  and  making  the  assumption  in  (49),  the  parameters  AS,BS  and  P0  can  be 
again  computed  in  the  essentially  same  way  as  we  did  for  the  simpler  approximate  models. 

Several  comments  are  in  order.  First,  note  that  a  simple  generalization  of  the  above  rep¬ 
resentation  would  be  to  allow  different  levels  of  approximation  for  different  components  of 
the  boundary  sequences  (e.g.  one  might  use  a  pf1- order  approximation  for  “vertical”  bound¬ 
ary  sequences  €  {vr,vl}  and  a  p^- order  approximation  for  “horizontal”  boundary 

sequences  £  {hu,  hi}).  Examples  of  such  a  generalization  will  be  given  in  the  next 

section  in  the  context  of  approximate  representations  for  MRF  texture  models. 

Second,  note  that  even  if  all  of  the  wavelet  coefficients  are  retained  at  all  levels  (i.e. 
if  the  boundary  representations  are  complete),  the  representation  we  have  just  described 
will  be  exact  only  if  the  GMRF  is  Markov  with  respect  to  either  a  first  or  second-order 
neighborhood.  As  we  have  discussed,  higher-order  neighborhoods  lead  to  thicker  boundaries, 
and  this  leads  naturally  to  the  idea  of  taking  wavelet  expansions  of  boundaries  of  width 

14Our  notation  is  slightly  different  from  that  in  [6],  In  particular,  in  [6],  increasing  superscript  j  cor¬ 
responds  to  lower  levels  in  the  decomposition  (i.e.,  fewer  wavelet  and  scaling  coefficients),  while  here  it 
corresponds  to  higher  levels. 
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two  or  more,  and  utilizing  these  as  the  state.  With  this  expanded  family,  the  approximate 
representations  can  be  made  exact  for  any  GMEF  by  keeping  complete  wavelet  expansions 
of  all  boundary  sequences  /3S|;,j(fc)  at  all  scales. 

Third,  not  only  has  dimensionality  been  reduced  in  going  from  the  exact  to  the  ap¬ 
proximate  representations,  but  it  has,  in  fact,  been  made  constant  at  the  first  N  ~  p  levels 
of  the  quadtree,  where  p  is  the  order  of  the  approximation  and  the  MRF  is  defined  on  a 
2n  X  2n  lattice.  In  particular,  the  dimension  of  the  state  at  node  s  is  equal  to  16  X  2P,  for 
m(s)  <  N  —  p.  When  m  =  N  —  p,  the  boundary  sequence  representations  are  complete  and 
the  dimension  of  the  state  drops  by  a  factor  of  2  at  each  level  thereafter. 

Finally,  the  order  of  the  approximations  required  to  achieve  a  desired  level  of  fidelity  in 
the  approximate  model  depends,  of  course,  on  the  statistical  structure  of  the  specific  GMRF 
under  study.  In  the  next  section  we  present  examples  which  illustrate  this  for  a  particular 
GMRF  and  a  number  of  different  approximate  representations. 

5  Examples  of  Approximate  2-D  Gaussian  MRF  Represen¬ 
tations 

In  this  section  we  present  examples  of  multiscale  approximate  representations  of  a  particular 
Gaussian  MRF.  GMRF’s  have  been  widely  used  in  the  context  of  texture  representation 
[10,  12,  17,  16,  37]  and  correspond  to  the  following  2-D  autoregressive  model  [11,  29]: 

zi,j  =  hktlZi-k,j-l  +  Zi,j  (61) 

(k,l)eD 

where  hk,i  —  h^k,~l>  D  is  the  neighborhood  [22]  around  the  origin  (0,0),  the  Gaussian  driv¬ 
ing  noise  eij  is  a  locally  correlated  sequence  of  random  variables,  and  (i,  j)  G  {0, 1,  •  •  • ,  T%  — 
1}  X  {0, 1,  •  •  .,T2  —  1}.  In  addition,  as  in  [11,  10],  we  interpret  the  lattice  as  a  toroid,  i.e. 
the  independent  variables  (i,j)  in  (61)  are  interpreted  modulo  ( Tx,T2 )•  For  instance,  the 
first-order  neighborhood  of  lattice  site  (0, 0)  is  given  by  the  set  {(1,  0),  (0, 1),  (0,  Ti  —  1),  (Ti  — 
1,0)}.  Finally,  the  correlation  structure  of  the  driving  noise  is  given  by: 

(a2  if  k  =  l  =  0 

E {e^je^k.j-l}  -  <  ~°2hk,l  if  (k,l)eD  (62) 

(  0  if  (k,l)#D 

and  has  the  property  that  E{eitjZk,i}  =  (T28iik6j,i .  Using  this  latter  property,  along  with  the 
fact  that  the  random  field  is  Gaussian,  one  can  prove  that  the  autoregressive  model  above 
does  imply  that  Zij  is  an  MRF  [45].  We  refer  to  (61)  as  an  nth- order  MRF  model  if  the  set 
D  corresponds  to  the  ni,l-order  neighborhood. 

The  specific  statistics  and  correlations  (as  in  (53)  -  (56))  required  to  construct  our 
multiscale  models  can  be  computed  efficiently  using  2-D  FFT’s  because  of  the  fact  that 
correlation  matrices  for  these  random  fields,  assuming  lexicographic  ordering,  are  block 
circulant  with  circulant  blocks  and  hence  these  random  fields  are  whitened  by  the  2-D  Fourier 
transform  [29].  Indeed,  as  described  in  [34],  the  structure  of  the  approximate  representations 
and  the  stationarity  of  the  GMRF  allow  us  to  compute  the  required  correlations  with  only 
2P  1-D  Fourier  transform  operations  per  level  of  the  representation,  where  p  is  the  order 
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of  the  approximation.  Furthermore,  these  calculations  need  only  be  performed  once,  since 
they  are  used  simply  to  determine  the  parameters  in  the  multiscale  approximate  model. 

Figure  14a  illustrates  a  sample  path  of  a  fourth  order  GMRF  corresponding  to  a  “wood” 
texture  [12],  and  three  approximations  of  this  MRF  based  on  the  Haar  wavelet  are  shown  in 
Figures  14b  -  14d.  This  texture  clearly  has  a  very  asymmetric  correlation  structure,  and  thus 
we  represent  the  vertical  and  horizontal  boundary  with  different  levels  of  approximation.  In 
Figure  14b,  the  horizontal  and  vertical  boundaries  are  represented  with  second  and  zeroth- 
order  approximations  respectively.  The  boundary  effects  apparent  in  Figure  14b  are  a  direct 
result  of  the  fact  that  (48)  does  not  hold  for  the  approximate  representations,  i.e.  values 
of  the  MRF  in  distinct  quadrants  are  not  independent  given  incomplete  information  about 
the  boundary  and  mid-line  values.  In  Figures  14c  and  14d,  the  horizontal  boundaries  are 
represented  with  fourth  and  sixth-order  approximations,  respectively,  whereas  the  vertical 
boundary  is  again  represented  with  a  zeroth-order  approximation.  As  the  complexity  of 
the  representation  increases,  the  sample  paths  of  the  approximate  random  fields  have  fewer 
boundary  effects.  The  approximate  representations  used  to  generate  Figures  14c  and  14d 
appear  to  accurately  represent  the  qualitative  and  statistical  features  of  the  MRF.  An 
interesting  point  here  is  that  the  level  of  representation  only  needs  to  be  increased  in  one 
direction  to  obtain  an  excellent  representation  of  the  field.  Also,  the  neighborhood  of 
this  MRF  is  fourth-order  and  thus  double  width  boundaries  would  be  needed  in  an  exact 
representation.  The  fields  shown  in  Figures  14b  to  14d,  however,  use  only  the  thinner 
boundaries  in  forming  states.  Several  experiments  were  done  in  which  we  used  the  double 
width  boundaries  in  forming  states  for  models  analogous  to  those  in  Figures  14b  to  14d. 
It  was  found,  however,  that  there  were  no  visual  differences  between  the  single  and  double 
width  approximate  representations.  Likewise,  approximations  of  the  “wood”  texture  based 
on  the  Daubechies  8  wavelet  [19]  were  also  visually  identical  to  their  Haar-representation 
counterparts.  That  is,  at  least  for  this  example,  and  for  the  others  we  have  examined,  the 
critical  issue  in  model  fidelity  appears  to  be  model  order  rather  than  the  particular  choice 
of  the  wavelet  used.  Furthermore,  as  these  examples  indicate,  we  can  achieve  quite  high 
quality  results  with  low-order  models,  which  in  turn  lead  to  extremely  efficient  algorithms 
as  in  [15,  13,  14,  35,  34,  36]. 

6  Discussion  and  Conclusions 

In  this  paper,  we  have  shown  how  to  represent  reciprocal  and  Markov  processes  in  one 
dimension  and  Markov  random  fields  in  two  dimensions  with  a  class  of  multiscale  stochastic 
models.  This  modeling  structure  provides  a  framework  for  the  development  of  efficient, 
scale-recursive  algorithms  for  a  variety  of  statistical  signal  processing  problems.  The  rep¬ 
resentations  in  1-D  rely  on  a  generalization  of  the  mid-point  deflection  construction  of 
Brownian  motion.  In  2-D,  we  introduced  a  “mid-line”  construction  which  leads  to  a  class  of 
models  with  scale- varying  dimension.  In  addition,  we  also  introduced  a  class  of  multiscale 
approximate  MRF  representations  based  on  1-D  wavelet  transforms.  This  family  allows 
one  to  tradeoff  complexity  and  accuracy  of  the  models,  and  provides  a  framework  for  the 
development  of  extremely  efficient  estimation  and  likelihood  calculation  algorithms.  An 
example  demonstrated  that  for  relatively  low-order  models,  an  approximate  model  which 
retains  most  of  the  qualitative  and  statistical  features  of  the  original  MRF  can  be  obtained. 
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We  feel  that  the  results  presented  in  the  preceding  section,  together  with  the  substantial 
flexibility  of  the  multiscale  modeling  framework,  demonstrate  the  promise  of  this  framework 
for  image  and  multidimensional  signal  processing.  Indeed  practical  applications  of  this 
framework  are  already  emerging,  as  in  the  segmentation  and  image  sequence  processing 
applications  described  in  [8,  9,  35].  In  addition,  in  [34,  36]  we  demonstrate  the  superior 
performance  of  likelihood-based  texture  identification  methods  using  low-order  versions  of 
the  models  introduced  in  Section  4  —  where  by  “superior”  we  mean  that  the  algorithm 
based  on  our  multiscale  models  has  significantly  better  probability  of  error  characteristics 
than  well-known  methods  such  as  those  in  [10],  and  achieves  virtually  the  same  performance 
as  the  truly  optimal  GMRF-based  likelihood  ratio  test,  which,  except  in  special  cases,  is 
prohibitively  complex  computationally  in  problems  of  even  moderate  size. 
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Figure  1:  This  graph,  taken  from  [35],  shows  for  one  example  that  the  multiscale  regular¬ 
ization  (MR)  approach  and  two  smoothness  constraint  based  iterative  approaches  to  the 
problem  of  computing  optical  flow  in  an  image  sequence  yield  comparable  rms  errors  (the 
rms  error  corresponding  to  the  non-iterative  MR  algorithm  is  shown  as  a  straight  line). 
This  result  is  typical  of  experiments  on  several  real  and  synthetic  image  sequences  in  [35]. 
The  multiscale  approach  requires  total  computation  equivalent  to  4.2  SOR  iterations  and 
hence  provides  a  substantial  computational  gain. 


Figure  2:  The  first  two  levels  of  a  “mid-point  deflection”  construction  of  a  Brownian  motion 
sample  path  are  shown  on  the  left.  The  construction  generates  a  sequence  of  approximations 
based  on  linear  interpolations  of  samples  of  the  Brownian  motion  at  the  dyadic  points.  On 
the  right,  the  basis  functions,  integrals  of  the  Haar  wavelet,  in  this  construction  are  shown. 
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Figure  3:  The  state  vectors  in  multiscale  stochastic  models  are  indexed  by  the  nodes  of  a 
dyadic  tree.  The  tree  is  a  set  of  connected  nodes,  in  which  each  node  has  two  offspring.  The 
parent  of  node  s  is  denoted  s- y  and  the  scale,  or  level,  of  node  s  is  denoted  by  m(s ). 


b, 


Scale 


m  =  1 


m  =  2 

m  =  3 


Figure  4:  The  state  vectors  for  the  first  three  levels  of  a  multiscale  model  representing 
Brownian  motion,  bt,  are  illustrated.  At  the  first  level,  the  state  is  the  vector  [6o,  &0.5,  &i] , 
which  is  indicated  by  the  three  points  at  m  =  1  surrounded  by  an  ellipse.  The  points  are 
placed  directly  below  the  points  t  =  0, 0.5  and  t  —  1  on  the  graph  above  to  indicate  that  the 
state  of  the  multiscale  process  at  the  first  level  consists  of  the  values  of  the  Brownian  motion 
at  those  three  points.  Likewise,  at  lower  levels,  the  states  are  indicated  by  sets  of  three 
points  surrounded  by  ellipses,  with  the  horizontal  location  of  the  points  in  correspondence 
with  time  indices  in  the  graph  at  the  top.  At  the  mth  level,  there  are  2Tn~1  state  vectors,  each 
of  which  consists  of  the  values  of  bt  at  three  consecutive  dyadic  points,  and  which  together 
represent  the  values  of  the  Brownian  motion  at  2m  +  1  distinct  points  on  the  interval  [0, 1]. 
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Figure  5:  The  state  vectors  are  shown  for  two  possible  multiscale  representations  for  a 
reciprocal  process  defined  on  a  discrete  interval  of  the  form  {0, 1,  •  •  * ,  10}.  In  (a),  a  dyadic 
tree  with  uniform  state  dimension,  but  non-uniform  depth  is  used,  whereas  in  (b)  a  dyadic 
tree  of  uniform  depth  but  non-uniform  state  size  is  used. 


Scale  0  15 


Figure  6:  The  state  vectors  are  shown  for  a  non-redundant  multiscale  representation  of 
a  1-D  reciprocal  process.  These  non-redundant  representations,  appropriately  generalized 
for  the  2-D  case,  are  useful  in  the  context  of  wavelet-based  approximate  representations  of 
Gaussian  MRF’s. 
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Scale  0  27 


Figure  7:  The  state  vectors  are  shown  for  a  multiscale  representation  on  a  third-order  tree. 


Figure  8:  The  first  two  scales  in  a  multiscale  representation  of  a  process  which  is  equal  to 
the  second  integral  of  white  noise  are  shown.  The  representation  consists  of  samples  of  the 
process  at  dyadic  points  along  with  a  piecewise- cubic  interpolation.  Compare  these  curves 
with  the  graphs  of  Figure  2,  which  depict  the  piecewise  linear  interpolation  of  the  first 
integral  of  white  noise. 
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Figure  9:  The  quadtree  structure  shown  is  used  for  the  multiscale  representations  of 
Markov  random  fields  (MRF’s),  Each  node  of  the  quadtree  has  four  offspring,  denoted 
sajvw ,  S&NE,  s&SE,  s&sw ■  Again,  the  parent  of  node  s  is  denoted  s- y,  and  in  this  case  7  is 
a  four-to-one  shift  operator. 


O  Boundary  points  0  Mid-line  points 


Figure  10:  The  state  vector  at  the  root  node  in  the  MRF  multiscale  representation  consists 
of  the  MRF  values  at  the  boundary  and  “mid-line”  points,  shown  in  the  shaded  region  here 
for  a  17  X  17  lattice.  To  construct  a  sample  path  of  the  MRF  using  the  “mid-line”  deflection 
construction,  we  start  by  choosing  a  sample  from  the  joint  distribution  of  the  values  in  the 
root  node  state. 
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O  Boundary  points  at  both  first  and  second  levels 
♦  Boundaiy  points  at  the  second  level  and  mid-line 
points  at  the  first  level 
o  New  (second  level)  mid-line  points 


Figure  11:  The  components  of  the  four  state  vectors  at  the  second  level  of  the  tree  are  scaled 
and  shifted  versions  of  the  components  of  the  state  at  the  root  node.  For  instance,  the  state 
corresponding  to  the  north-west  corner  at  the  second  level  of  a  representation  for  an  MEF 
defined  on  a  17  X  17  lattice  consists  of  the  values  of  the  process  at  the  shaded  points.  The 
values  of  the  MRF  at  the  boundary  points  in  these  second  level  states  are  mapped  down 
from  the  root  node  state,  and  the  values  at  the  new  mid-lines  in  each  of  the  four  quadrants 
are  chosen  independently.  In  particular,  the  new  mid-line  values  in  any  given  quadrant  are 
independent  of  values  of  the  MRF  outside  that  quadrant,  given  the  boundary.  Thus,  in 
the  construction  of  a  sample  path,  we  can  choose  values  along  each  of  the  four  sets  of  new 
mid-lines  independently  and  in  parallel.  This  process  can  then  be  iterated,  by  defining  the 
states  of  the  multiscale  process  at  lower  levels  in  the  quadtree  with  respect  to  successively 
smaller  subdomains,  and  constructing  the  process  (along  boundary  and  mid-line  points) 
independently  within  each  subdomain. 
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Figure  12:  The  state  at  the  root  node  in  a  non-redundant  exact  multiscale  representation 
of  an  MRF  defined  on  a  16  X  16  lattice  consists  of  the  values  of  the  process  at  the  shaded 
points.  The  redundancy  in  the  exact  representation  is  eliminated  by  generating  the  values 
of  the  process  along  two  mid- lines  instead  of  one.  The  figure  also  illustrates  the  sets  rSj;, 
and  the  sequences  (3s,i,j{k )  defined  in  the  context  of  approximate  representations  in  Section 
4.3.  The  are  1-D  sequences  corresponding  to  values  of  the  MRF  along  boundaries 

of  square  subdomains  (which,  at  the  first  level,  are  the  white  areas  in  the  figure).  These 
sequences  overlap  at  the  corner  points  of  boundaries.  In  the  figure,  this  is  represented  by 
putting  two  symbols  at  the  same  lattice  point,  e.g.  v  and  t>  in  the  upper  left  corner. 
The  approximate  representations  take  as  the  state  subsets  of  the  coefficients  in  1-D  wavelet 
expansions  of  the  /33tij(k)  sequences. 
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Figure  13:  The  four  states  at  the  second  level  of  the  tree  in  a  non-redundant  exact  multiscale 
representation  are  scaled  and  shifted  versions  of  the  state  at  the  root  node,  and  are  shown 
here  for  an  MRF  defined  on  a  16  X  16  lattice.  The  state  in  the  north-west  corner  contains 
the  values  of  the  process  at  the  shaded  points  in  the  north-west  8x8  quadrant.  With  the 
node  s  corresponding  to  this  north-west  corner  state,  the  sets  and  sequences  (33tNW,j 
are  illustrated.  Note  again  that  the  sequences  overlap. 
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Figure  14:  A  sample  path  of  a  Gaussian  MRF  representing  the  “wood”  texture  of  [12]  is 
shown  in  (a).  Parts  (fe)  —  (d)  illustrate  sample  paths  of  approximate  representations  of  the 
MRF  based  on  the  Haar  wavelet.  The  structure  of  the  MRF  suggests  using  approximations 
which  use  relatively  low  order  representations  of  vertical  boundaries.  The  approximate  rep¬ 
resentations  used  to  generate  (b)  —  (d)  used  zeroth-order  representations  of  the  vertical 
boundaries,  and  second,  fourth  and  sixth-order  representations  for  the  horizontal  bound¬ 
aries,  respectively. 
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